Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
client.py	client.py
client_streaming.py	client_streaming.py
client_streaming_tool_calling.py	client_streaming_tool_calling.py
client_tool_calling.py	client_tool_calling.py
m_serve_example_simple.py	m_serve_example_simple.py
m_serve_example_streaming.py	m_serve_example_streaming.py
m_serve_example_tool_calling.py	m_serve_example_tool_calling.py
pii_serve.py	pii_serve.py

M Serve Examples

This directory contains examples for deploying Mellea programs as API services using the m serve CLI command.

Files

m_serve_example_simple.py

A simple example showing how to structure a Mellea program for serving as an API.

Key Features:

Defining a serve() function that takes input and returns output
Using requirements and sampling strategies in served programs
Custom validation functions for API constraints
Handling chat message inputs

m_serve_example_streaming.py

A dedicated streaming example for m serve that supports both modes:

stream=False returns a normal computed response
stream=True returns an uncomputed thunk so the server can emit incremental Server-Sent Events (SSE) chunks

pii_serve.py

Example of serving a PII (Personally Identifiable Information) detection service.

client.py

Client code for testing the served API endpoints with non-streaming requests.

client_streaming.py

Client code demonstrating streaming responses using Server-Sent Events (SSE) against m_serve_example_streaming.py.

Concepts Demonstrated

API Deployment: Exposing Mellea programs as REST APIs
Input Handling: Processing structured inputs (chat messages, requirements)
Output Formatting: Returning appropriate response types
Validation in Production: Using requirements in deployed services
Model Options: Passing model configuration through API
Streaming Responses: Real-time token streaming via Server-Sent Events (SSE)

Basic Pattern

from mellea import start_session
from mellea.stdlib.sampling import RejectionSamplingStrategy
from mellea.core import Requirement

session = start_session()

def serve(input: list[ChatMessage], 
          requirements: list[str] | None = None,
          model_options: dict | None = None):
    """Main serving function - called by m serve."""
    message = input[-1].content
    
    result = session.instruct(
        description=message,
        requirements=requirements or [],
        strategy=RejectionSamplingStrategy(loop_budget=3),
        model_options=model_options
    )
    return result

Running the Server

Sampling

# Start the sampling example server
m serve docs/examples/m_serve/m_serve_example_simple.py

# In another terminal, test with the non-streaming client
python docs/examples/m_serve/client.py

Streaming

# Start the dedicated streaming example server
m serve docs/examples/m_serve/m_serve_example_streaming.py

# In another terminal, test with the streaming client
python docs/examples/m_serve/client_streaming.py

Streaming Support

The server supports streaming responses via Server-Sent Events (SSE) when the stream=True parameter is set in the request. This allows clients to receive tokens as they are generated, providing a better user experience for long-running generations.

For a real streaming demo, serve m_serve_example_streaming.py. That example supports both normal and streaming responses consistently. The sampling example (m_serve_example_simple.py) demonstrates rejection sampling and validation, not token-by-token streaming.

Key Features:

Real-time token streaming using SSE
OpenAI-compatible streaming format (ChatCompletionChunk)
Final chunk includes usage statistics when the backend provides usage data
The dedicated streaming example supports both stream=False and stream=True
Works with any backend that supports ModelOutputThunk.astream()

Example:

import openai

client = openai.OpenAI(api_key="na", base_url="http://0.0.0.0:8080/v1")

# Enable streaming with stream=True
stream = client.chat.completions.create(
    messages=[{"role": "user", "content": "Tell me a story"}],
    model="granite4.1:3b",
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

API Endpoints

The m serve command automatically creates:

POST /generate: Main generation endpoint
GET /health: Health check endpoint
GET /docs: API documentation (Swagger UI)

Use Cases

Production Deployment: Deploy Mellea programs as microservices
API Integration: Integrate with existing systems via REST API
Scalability: Run multiple instances behind a load balancer
Monitoring: Add logging and metrics to served programs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

M Serve Examples

Files

m_serve_example_simple.py

m_serve_example_streaming.py

pii_serve.py

client.py

client_streaming.py

Concepts Demonstrated

Basic Pattern

Running the Server

Sampling

Streaming

Streaming Support

API Endpoints

Use Cases

Related Documentation

FilesExpand file tree

m_serve

Directory actions

More options

Directory actions

More options

Latest commit

History

m_serve

Folders and files

parent directory

README.md

M Serve Examples

Files

m_serve_example_simple.py

m_serve_example_streaming.py

pii_serve.py

client.py

client_streaming.py

Concepts Demonstrated

Basic Pattern

Running the Server

Sampling

Streaming

Streaming Support

API Endpoints

Use Cases

Related Documentation