Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion samples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,5 +10,5 @@ Explore complete working examples that demonstrate how to use Foundry Local —
|----------|---------|-------------|
| [**C#**](cs/) | 13 | .NET SDK samples including native chat, embeddings, audio transcription, tool calling, model management, web server, and tutorials. Uses WinML on Windows for hardware acceleration. |
| [**JavaScript**](js/) | 13 | Node.js SDK samples including native chat, embeddings, audio transcription, Electron desktop app, Copilot SDK integration, LangChain, tool calling, web server, and tutorials. |
| [**Python**](python/) | 10 | Python samples using the OpenAI-compatible API, including chat, embeddings, audio transcription, LangChain integration, tool calling, web server, and tutorials. |
| [**Python**](python/) | 11 | Python samples using the OpenAI-compatible API, including chat, embeddings, audio transcription, LangChain integration, tool calling, web server, Responses API, and tutorials. |
| [**Rust**](rust/) | 9 | Rust SDK samples including native chat, embeddings, audio transcription, tool calling, web server, and tutorials. |
1 change: 1 addition & 0 deletions samples/python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ These samples demonstrate how to use Foundry Local with Python.
| [embeddings](embeddings/) | Generate single and batch text embeddings using the Foundry Local SDK. |
| [audio-transcription](audio-transcription/) | Transcribe audio files using the Whisper model. |
| [web-server](web-server/) | Start a local OpenAI-compatible web server and call it with the OpenAI Python SDK. |
| [web-server-responses](web-server-responses/) | Call a running local OpenAI-compatible web server with the Responses API, including streaming and tool calling. |
| [tool-calling](tool-calling/) | Tool calling with custom function definitions (get_weather, calculate). |
| [langchain-integration](langchain-integration/) | LangChain integration for building translation and text generation chains. |
| [tutorial-chat-assistant](tutorial-chat-assistant/) | Build an interactive multi-turn chat assistant (tutorial). |
Expand Down
45 changes: 45 additions & 0 deletions samples/python/web-server-responses-vision/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Foundry Local Python Vision Sample (Responses API)

This sample demonstrates vision (image understanding) capabilities using the Foundry Local web service and the OpenAI Responses API.

It demonstrates:

- Streaming a vision response via the Responses API
- Uses a default test image (`src/test_image.jpg`) if no image path is provided

## What gets installed

Install the sample dependencies from `requirements.txt`:

```bash
pip install -r requirements.txt
```

That installs:

- `foundry-local-sdk`
- `openai`
- `Pillow` (for image resizing)

The sample downloads the specified model the first time it runs (skips if already cached).

## Run the sample

From this directory:

```bash
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
python src\app.py qwen3.5-0.8b
```

You can also pass a custom image path as the second argument.

On macOS or Linux, activate the virtual environment with:

```bash
source .venv/bin/activate
```

The sample starts the local web service, sends vision requests via the Responses API to `http://localhost:<port>/v1`, prints the model output, and then stops the web service.
3 changes: 3 additions & 0 deletions samples/python/web-server-responses-vision/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
foundry-local-sdk
openai
Pillow
107 changes: 107 additions & 0 deletions samples/python/web-server-responses-vision/src/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
# <complete_code>
# <imports>
import base64
import io
import sys

from PIL import Image
from openai import OpenAI

from foundry_local_sdk import Configuration, FoundryLocalManager
# </imports>
import os

if len(sys.argv) < 2:
print("Usage: python src/app.py <model_alias> [image_path]")
print(" Example: python src/app.py qwen3.5-0.8b")
sys.exit(1)

model_alias = sys.argv[1]
default_image = os.path.join(os.path.dirname(__file__), "test_image.jpg")
image_path = sys.argv[2] if len(sys.argv) > 2 else default_image

def resize_and_encode(path, max_dim=512):
"""Load and resize a local image, returning (base64_str, media_type)."""
img = Image.open(path)
if max(img.size) > max_dim:
img.thumbnail((max_dim, max_dim))
print(f" (resized to {img.size[0]}x{img.size[1]})")
buf = io.BytesIO()
img.save(buf, format="JPEG")
return base64.b64encode(buf.getvalue()).decode(), "image/jpeg"


# <init>
config = Configuration(app_name="foundry_local_samples")
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance
# </init>

# <model_setup>
model = manager.catalog.get_model(model_alias)
if model is None:
available = [m.alias for m in manager.catalog.list_models()]
print(f"\nModel '{model_alias}' not found in catalog.")
print(f"Available models: {available}")
sys.exit(1)

if not model.is_cached:
print(f"\nDownloading model {model_alias}...")
model.download(
lambda progress: print(f"\rDownloading model: {progress:.2f}%", end="", flush=True)
)
print("\nModel downloaded")

print("\nLoading model...")
model.load()
print("Model loaded")
# </model_setup>

# <server_setup>
print("\nStarting web service...")
manager.start_web_service()
base_url = manager.urls[0].rstrip("/") + "/v1"
print("Web service started")

# <<<<<< OPENAI SDK USAGE >>>>>>
# Use the OpenAI SDK to call the local Foundry web service Responses API
openai = OpenAI(base_url=base_url, api_key="notneeded")
# </server_setup>

# <inference>
print(f"\nPreparing image: {image_path}")
image_b64, media_type = resize_and_encode(image_path)

vision_input = [
{
"type": "message",
"role": "user",
"content": [
{"type": "input_text", "text": "Describe this image."},
{
"type": "input_image",
"image_data": image_b64,
"media_type": media_type,
},
],
}
]

print("\nStreaming vision response...")
stream = openai.responses.create(
model=model.id,
input="placeholder",
extra_body={"input": vision_input},
stream=True,
)

print("[ASSISTANT]: ", end="", flush=True)
for event in stream:
if getattr(event, "type", None) == "response.output_text.delta":
print(getattr(event, "delta", ""), end="", flush=True)
print()
# </inference>

openai.close()
manager.stop_web_service()
model.unload()
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
44 changes: 44 additions & 0 deletions samples/python/web-server-responses/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Foundry Local Python Responses Web-Service Sample

This sample starts the Foundry Local OpenAI-compatible web service, then calls the Responses API with the official OpenAI Python client.

It demonstrates:

- A non-streaming `/v1/responses` call
- A streaming `/v1/responses` call
- A function/tool-calling round trip using `previous_response_id`

## What gets installed

Install the sample dependencies from `requirements.txt`:

```bash
pip install -r requirements.txt
```

That installs:

- `foundry-local-sdk` on non-Windows platforms
- `foundry-local-sdk-winml` on Windows
- `openai`

The sample downloads/registers Foundry Local execution providers and downloads the `qwen2.5-0.5b` model the first time it runs.

## Run the sample

From this directory:

```bash
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
python src\app.py
```

On macOS or Linux, activate the virtual environment with:

```bash
source .venv/bin/activate
```

The sample starts the local web service, sends Responses API requests to `http://localhost:<port>/v1`, prints the model output, and then unloads the model and stops the web service.
3 changes: 3 additions & 0 deletions samples/python/web-server-responses/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
foundry-local-sdk; sys_platform != "win32"
foundry-local-sdk-winml; sys_platform == "win32"
openai
152 changes: 152 additions & 0 deletions samples/python/web-server-responses/src/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# <complete_code>
# <imports>
import json
from typing import Any

from openai import OpenAI

from foundry_local_sdk import Configuration, FoundryLocalManager
# </imports>


def get_response_text(response: Any) -> str:
if isinstance(getattr(response, "output_text", None), str):
return response.output_text
return "".join(
getattr(part, "text", "")
for item in getattr(response, "output", []) or []
for part in getattr(item, "content", []) or []
if getattr(part, "type", None) == "output_text"
)


# <init>
# Initialize the Foundry Local SDK
config = Configuration(app_name="foundry_local_samples")
FoundryLocalManager.initialize(config)
manager = FoundryLocalManager.instance

# Download and register all execution providers.
current_ep = ""


def _ep_progress(ep_name: str, percent: float):
global current_ep
if ep_name != current_ep:
if current_ep:
print()
current_ep = ep_name
print(f"\r {ep_name:<30} {percent:5.1f}%", end="", flush=True)


manager.download_and_register_eps(progress_callback=_ep_progress)
if current_ep:
print()
# </init>

# <model_setup>
model_alias = "qwen2.5-0.5b"
model = manager.catalog.get_model(model_alias)

print(f"\nDownloading model {model_alias}...")
model.download(
lambda progress: print(
f"\rDownloading model: {progress:.2f}%",
end="",
flush=True,
)
)
print("\nModel downloaded")

print("\nLoading model...")
model.load()
print("Model loaded")
# </model_setup>

# <server_setup>
print("\nStarting web service...")
manager.start_web_service()
base_url = manager.urls[0].rstrip("/") + "/v1"
print("Web service started")

# <<<<<< OPENAI SDK USAGE >>>>>>
# Use the OpenAI SDK to call the local Foundry web service Responses API
openai = OpenAI(
base_url=base_url,
api_key="notneeded",
)
# </server_setup>

try:
print("\nTesting a non-streaming Responses call...")
response = openai.responses.create(
model=model.id,
input="Reply with one short sentence about local AI.",
)
print(f"[ASSISTANT]: {get_response_text(response)}")

print("\nTesting a streaming Responses call...")
stream = openai.responses.create(
model=model.id,
input="Count from one to three.",
stream=True,
)

print("[ASSISTANT STREAM]: ", end="", flush=True)
for event in stream:
if getattr(event, "type", None) == "response.output_text.delta":
print(getattr(event, "delta", ""), end="", flush=True)
print()

print("\nTesting Responses tool calling...")
tools = [
{
"type": "function",
"name": "get_weather",
"description": "Get the current weather. This sample always returns Seattle weather.",
"parameters": {
"type": "object",
"properties": {},
"additionalProperties": False,
},
},
]

tool_response = openai.responses.create(
model=model.id,
input="Use the get_weather tool and then answer with the weather.",
tools=tools,
tool_choice="required",
store=True,
)

function_call = next(
(item for item in getattr(tool_response, "output", []) or [] if getattr(item, "type", None) == "function_call"),
None,
)
if function_call is None:
raise RuntimeError("Expected the model to call get_weather.")

print(f"[TOOL CALL]: {function_call.name}({function_call.arguments})")

final_response = openai.responses.create(
model=model.id,
previous_response_id=tool_response.id,
input=[
{
"type": "function_call_output",
"call_id": function_call.call_id,
"output": json.dumps({"location": "Seattle", "weather": "72 degrees F and sunny"}),
}
],
tools=tools,
)

print(f"[ASSISTANT FINAL]: {get_response_text(final_response)}")
# <<<<<< END OPENAI SDK USAGE >>>>>>
finally:
# Tidy up
openai.close()
manager.stop_web_service()
model.unload()
# </complete_code>
Loading
Loading