This directory contains examples for deploying Mellea programs as API services using the m serve CLI command.
A simple example showing how to structure a Mellea program for serving as an API.
Key Features:
- Defining a
serve()function that takes input and returns output - Using requirements and sampling strategies in served programs
- Custom validation functions for API constraints
- Handling chat message inputs
A dedicated streaming example for m serve that supports both modes:
stream=Falsereturns a normal computed responsestream=Truereturns an uncomputed thunk so the server can emit incremental Server-Sent Events (SSE) chunks
Example of serving a PII (Personally Identifiable Information) detection service.
Client code for testing the served API endpoints with non-streaming requests.
Client code demonstrating streaming responses using Server-Sent Events (SSE)
against m_serve_example_streaming.py.
- API Deployment: Exposing Mellea programs as REST APIs
- Input Handling: Processing structured inputs (chat messages, requirements)
- Output Formatting: Returning appropriate response types
- Validation in Production: Using requirements in deployed services
- Model Options: Passing model configuration through API
- Streaming Responses: Real-time token streaming via Server-Sent Events (SSE)
from mellea import start_session
from mellea.stdlib.sampling import RejectionSamplingStrategy
from mellea.core import Requirement
session = start_session()
def serve(input: list[ChatMessage],
requirements: list[str] | None = None,
model_options: dict | None = None):
"""Main serving function - called by m serve."""
message = input[-1].content
result = session.instruct(
description=message,
requirements=requirements or [],
strategy=RejectionSamplingStrategy(loop_budget=3),
model_options=model_options
)
return result# Start the sampling example server
m serve docs/examples/m_serve/m_serve_example_simple.py
# In another terminal, test with the non-streaming client
python docs/examples/m_serve/client.py# Start the dedicated streaming example server
m serve docs/examples/m_serve/m_serve_example_streaming.py
# In another terminal, test with the streaming client
python docs/examples/m_serve/client_streaming.pyThe server supports streaming responses via Server-Sent Events (SSE) when the
stream=True parameter is set in the request. This allows clients to receive
tokens as they are generated, providing a better user experience for long-running
generations.
For a real streaming demo, serve m_serve_example_streaming.py. That example
supports both normal and streaming responses consistently. The sampling example
(m_serve_example_simple.py) demonstrates rejection sampling and validation,
not token-by-token streaming.
Key Features:
- Real-time token streaming using SSE
- OpenAI-compatible streaming format (
ChatCompletionChunk) - Final chunk includes usage statistics when the backend provides usage data
- The dedicated streaming example supports both
stream=Falseandstream=True - Works with any backend that supports
ModelOutputThunk.astream()
Example:
import openai
client = openai.OpenAI(api_key="na", base_url="http://0.0.0.0:8080/v1")
# Enable streaming with stream=True
stream = client.chat.completions.create(
messages=[{"role": "user", "content": "Tell me a story"}],
model="granite4.1:3b",
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)The m serve command automatically creates:
POST /generate: Main generation endpointGET /health: Health check endpointGET /docs: API documentation (Swagger UI)
- Production Deployment: Deploy Mellea programs as microservices
- API Integration: Integrate with existing systems via REST API
- Scalability: Run multiple instances behind a load balancer
- Monitoring: Add logging and metrics to served programs
- See
cli/serve/for server implementation - See
mellea/stdlib/session.pyfor session management