Skip to content

Latest commit

 

History

History
529 lines (425 loc) · 19.5 KB

File metadata and controls

529 lines (425 loc) · 19.5 KB

Session 5: Build AI-Powered Agents Fast with Foundry Local

Note: Agent capabilities in Foundry Local evolve—confirm support in the latest release notes before implementing advanced patterns.

Overview

Use Foundry Local to rapidly prototype agentic applications: system prompts, grounding, and orchestration patterns. When agent support is present, you can standardize on OpenAI-compatible function calling or use Azure AI Agents on the cloud side in hybrid designs.

🔄 Updated for Modern SDK: This module has been aligned with the latest Microsoft Foundry-Local repository patterns and matches the comprehensive implementation in samples/05/. The examples now use the modern foundry-local-sdk and OpenAI client instead of manual requests.

🏗️ Architecture Highlights:

  • Specialist Agents: Retrieval, Reasoning, and Execution agents with distinct capabilities
  • Coordinator Pattern: Orchestrates multi-agent workflows with feedback loops
  • Modern SDK Integration: Uses FoundryLocalManager and OpenAI client
  • Production Ready: Includes error handling, performance monitoring, and health checks
  • Comprehensive Examples: Interactive Jupyter notebook with advanced features

📁 Local Implementation:

  • samples/05/multi_agent_orchestration.ipynb - Interactive examples and benchmarks
  • samples/05/agents/specialists.py - Agent implementations
  • samples/05/agents/coordinator.py - Orchestration logic

References:

Learning Objectives

  • Design system prompts and grounding strategies for reliable behavior
  • Implement function calling (tool use) patterns
  • Orchestrate multi-agent workflows (local and hybrid)
  • Plan for observability and safety

Part 1: System Prompts and Grounding

  • Define strict roles, constraints, and output schemas
  • Ground responses with local or enterprise data
  • Enforce JSON outputs for downstream automation

Part 2: Function Calling (Modern SDK Approach)

# tools.py
import json
from typing import List, Dict, Any

def get_weather(city: str) -> str:
    return f"Weather in {city}: Sunny, 25C"

# Modern tools format for OpenAI API
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"}
                },
                "required": ["city"]
            }
        }
    }
]
# agent.py
from foundry_local import FoundryLocalManager
from openai import OpenAI
import json
from tools import TOOLS, get_weather

# Initialize Foundry Local Manager
alias = "phi-4-mini"
manager = FoundryLocalManager(alias)

# Create OpenAI client using Foundry Local endpoint
client = OpenAI(
    base_url=manager.endpoint,
    api_key=manager.api_key
)

SYSTEM_PROMPT = "You are a helpful assistant. Use tools when needed."

def process_function_call(messages: List[Dict], tools: List[Dict]) -> str:
    """Process function calling with modern OpenAI API."""
    try:
        response = client.chat.completions.create(
            model=manager.get_model_info(alias).id,
            messages=messages,
            tools=tools,
            tool_choice="auto"
        )
        
        message = response.choices[0].message
        
        if message.tool_calls:
            # Handle function calls
            messages.append(message)
            
            for tool_call in message.tool_calls:
                if tool_call.function.name == "get_weather":
                    args = json.loads(tool_call.function.arguments)
                    result = get_weather(args["city"])
                    
                    # Add function result to messages
                    messages.append({
                        "role": "tool",
                        "tool_call_id": tool_call.id,
                        "content": result
                    })
            
            # Get final response
            final_response = client.chat.completions.create(
                model=manager.get_model_info(alias).id,
                messages=messages
            )
            return final_response.choices[0].message.content
        else:
            return message.content
            
    except Exception as e:
        return f"Error: {str(e)}"

# Example usage
messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "What's the weather in Paris?"}
]

result = process_function_call(messages, TOOLS)
print(result)

Run:

# Ensure Foundry Local is running with a model
foundry model run phi-4-mini
python agent.py

Part 3: Multi-Agent Orchestration (Pattern)

Design a coordinator that routes tasks to specialist agents (retrieval, reasoning, execution) using Foundry Local’s OpenAI-compatible endpoint.

Step 1) Define specialist agents with modern SDK (see samples/05/agents/specialists.py)

# agents/specialists.py
from foundry_local import FoundryLocalManager
from openai import OpenAI
from typing import List, Dict, Any

class FoundryClient:
    """Shared client for all specialist agents."""
    
    def __init__(self, model_alias: str = "phi-4-mini"):
        self.client = None
        self.model_name = None
        self.model_alias = model_alias
        self._initialize_client()
    
    def _initialize_client(self):
        """Initialize OpenAI client with Foundry Local."""
        try:
            manager = FoundryLocalManager(self.model_alias)
            model_info = manager.get_model_info(self.model_alias)
            
            self.client = OpenAI(
                base_url=manager.endpoint,
                api_key=manager.api_key
            )
            self.model_name = model_info.id
            print(f"✅ Foundry Local initialized with model: {self.model_name}")
        except Exception as e:
            print(f"❌ Error initializing Foundry Local: {e}")
            raise
    
    def chat(self, messages: List[Dict[str, str]], max_tokens: int = 300, temperature: float = 0.4) -> str:
        """Send chat completion request to the model."""
        try:
            response = self.client.chat.completions.create(
                model=self.model_name,
                messages=messages,
                max_tokens=max_tokens,
                temperature=temperature
            )
            return response.choices[0].message.content
        except Exception as e:
            return f"Error generating response: {str(e)}"

# Global client instance
_client = FoundryClient()

class RetrievalAgent:
    """Agent specialized in retrieving relevant information from knowledge sources."""
    
    SYSTEM = """You are a specialized retrieval agent. Your job is to extract and retrieve 
    the most relevant information from knowledge sources based on a given query. Focus on key facts, 
    data points, and contextual information that would be useful for decision-making."""
    
    def run(self, query: str) -> str:
        """Retrieve relevant information based on the query."""
        messages = [
            {"role": "system", "content": self.SYSTEM},
            {"role": "user", "content": f"Query: {query}\n\nRetrieve the most relevant key facts, data points, and contextual information that would help answer this query or support decision-making around it."}
        ]
        return _client.chat(messages)

class ReasoningAgent:
    """Agent specialized in step-by-step analysis and reasoning."""
    
    SYSTEM = """You are a specialized reasoning agent. Your job is to analyze inputs 
    step-by-step and produce structured, logical conclusions. Break down complex problems 
    into manageable parts and provide clear reasoning for your conclusions."""
    
    def run(self, context: str, question: str) -> str:
        """Analyze context and question to produce structured conclusions."""
        messages = [
            {"role": "system", "content": self.SYSTEM},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}\n\nAnalyze this step-by-step and provide a structured, logical conclusion with clear reasoning."}
        ]
        return _client.chat(messages, max_tokens=400)

class ExecutionAgent:
    """Agent specialized in creating actionable execution plans."""
    
    SYSTEM = """You are a specialized execution agent. Your job is to transform decisions 
    and conclusions into concrete, actionable steps. Always format your response as valid JSON 
    with an array of action items. Each action should be specific, measurable, and achievable."""
    
    def run(self, decision: str) -> str:
        """Transform decision into actionable steps in JSON format."""
        messages = [
            {"role": "system", "content": self.SYSTEM},
            {"role": "user", "content": f"Decision/Conclusion:\n{decision}\n\nCreate 3-5 specific, actionable steps to implement this decision. Format as JSON with this structure:\n{{\"actions\": [{{\"step\": 1, \"description\": \"...\", \"priority\": \"high/medium/low\", \"timeline\": \"...\"}}]}}"}
        ]
        return _client.chat(messages, max_tokens=400, temperature=0.3)

Step 2) Build the coordinator with advanced features

# agents/coordinator.py
from .specialists import RetrievalAgent, ReasoningAgent, ExecutionAgent
from typing import Dict, Any
import time
import json

class Coordinator:
    """Multi-agent coordinator that orchestrates specialist agents to handle complex tasks."""
    
    def __init__(self):
        """Initialize the coordinator with specialist agents."""
        self.retrieval = RetrievalAgent()
        self.reasoning = ReasoningAgent()
        self.execution = ExecutionAgent()
    
    def handle(self, user_goal: str) -> Dict[str, Any]:
        """
        Orchestrate multiple agents to handle a complex user goal.
        
        Args:
            user_goal: The user's high-level goal or request
            
        Returns:
            Dictionary containing the goal, context, decision, and actions
        """
        print(f"🎯 **Coordinator:** Processing goal: {user_goal}")
        print("=" * 60)
        
        start_time = time.time()
        
        # Step 1: Retrieve relevant context
        print("📚 **Step 1:** Retrieving context...")
        context = self.retrieval.run(user_goal)
        print(f"   ✅ Context retrieved ({len(context)} chars)")
        
        # Step 2: Analyze and reason about the context
        print("🧠 **Step 2:** Analyzing and reasoning...")
        decision = self.reasoning.run(context, user_goal)
        print(f"   ✅ Analysis completed ({len(decision)} chars)")
        
        # Step 3: Create actionable execution plan
        print("⚡ **Step 3:** Creating execution plan...")
        actions = self.execution.run(decision)
        print(f"   ✅ Execution plan created ({len(actions)} chars)")
        
        end_time = time.time()
        processing_time = end_time - start_time
        
        result = {
            "goal": user_goal,
            "context": context,
            "decision": decision,
            "actions": actions,
            "agent_flow": ["retrieval", "reasoning", "execution"],
            "processing_time": processing_time,
            "timestamp": time.strftime("%Y-%m-%d %H:%M:%S")
        }
        
        print(f"✅ **Coordination Complete** (⏱️ {processing_time:.2f}s)")
        return result
    
    def handle_with_feedback(self, user_goal: str, feedback_rounds: int = 1) -> Dict[str, Any]:
        """
        Handle a goal with multiple feedback rounds for refinement.
        
        Args:
            user_goal: The user's high-level goal or request
            feedback_rounds: Number of feedback rounds to perform
            
        Returns:
            Dictionary containing the refined result
        """
        result = self.handle(user_goal)
        
        for round_num in range(feedback_rounds):
            print(f"\n🔄 **Feedback Round {round_num + 1}:**")
            print("-" * 40)
            
            # Use reasoning agent to refine the execution plan
            refinement_prompt = f"""
            Original Goal: {user_goal}
            Current Decision: {result['decision']}
            Current Actions: {result['actions']}
            
            Review the above and suggest improvements or refinements to make the execution plan more effective.
            """
            
            refined_decision = self.reasoning.run(result['context'], refinement_prompt)
            refined_actions = self.execution.run(refined_decision)
            
            result['decision'] = refined_decision
            result['actions'] = refined_actions
            result['refinement_rounds'] = round_num + 1
            
            print(f"   ✅ Round {round_num + 1} refinement completed")
        
        return result

def main():
    """Main function demonstrating the multi-agent coordinator."""
    print("🤖 **Multi-Agent Coordinator Demo**")
    print("=" * 50)
    
    # Create coordinator
    coord = Coordinator()
    
    # Example goals
    example_goals = [
        "Create a plan to onboard 5 new customers this month",
        "Develop a strategy to improve team productivity by 20%",
        "Design a customer feedback collection system"
    ]
    
    # Process example with feedback
    goal = example_goals[0]
    print(f"🎯 **Processing Goal:** {goal}")
    print("-" * 50)
    
    try:
        # Basic processing
        result = coord.handle(goal)
        
        # With feedback refinement
        refined_result = coord.handle_with_feedback(goal, feedback_rounds=1)
        
        print("\n📊 **Final Result:**")
        print("=" * 50)
        print(f"**Goal:** {refined_result['goal']}")
        print(f"**Processing Time:** {refined_result['processing_time']:.2f}s")
        
        # Try to parse actions as JSON
        try:
            actions_json = json.loads(refined_result['actions'])
            print(f"\n**Formatted Actions:**")
            print(json.dumps(actions_json, indent=2))
        except (json.JSONDecodeError, TypeError):
            print(f"\n**Actions:** {refined_result['actions']}")
            
    except Exception as e:
        print(f"❌ **Error:** {e}")
        print("\nPlease ensure Foundry Local is running with a model loaded.")

if __name__ == "__main__":
    main()

Step 3) Validate against Foundry Local and run samples

REM Confirm the local endpoint and model are available
foundry model list
foundry model run phi-4-mini
curl http://localhost:8000/v1/models

REM Run the coordinator from Module08 directory
cd Module08
python -m samples.05.agents.coordinator

REM Or explore the comprehensive Jupyter notebook
jupyter notebook samples/05/multi_agent_orchestration.ipynb

📚 Local Sample References:

  • Main Implementation: samples/05/agents/specialists.py and samples/05/agents/coordinator.py
  • Comprehensive Examples: samples/05/multi_agent_orchestration.ipynb
  • Setup Instructions: samples/05/README.md

🔗 Related Foundry Local Samples:

Guidelines:

  • Implement retries and timeouts between agents
  • Add a small in-memory store (dict) for conversation/thread state
  • Introduce rate-limiting when chaining multiple calls

Part 4: Observability and Safety

Track prompts, responses, and errors locally, while enforcing data hygiene in your agent stack.

Step 1) Lightweight request logging (optional)

Note: The following helper is not included by default. Create infra/obs.py if you want local JSON logging for experiments.

# infra/obs.py
import time, json, os
from datetime import datetime

LOG_DIR = os.getenv("FOUNDRY_AGENT_LOG_DIR", "./agent_logs")
os.makedirs(LOG_DIR, exist_ok=True)

def log_event(kind: str, payload: dict):
    ts = datetime.utcnow().strftime("%Y%m%dT%H%M%SZ")
    path = os.path.join(LOG_DIR, f"{ts}_{kind}.json")
    with open(path, "w", encoding="utf-8") as f:
        json.dump(payload, f, ensure_ascii=False, indent=2)

Integrate logging into agents (optional):

# in agents/specialists.py after receiving content
from infra.obs import log_event
# ... inside chat(...)
resp = r.json()
log_event("chat_request", {"endpoint": f"{BASE_URL}/v1/chat/completions"})
log_event("chat_response", resp)
return resp["choices"][0]["message"]["content"]

Step 2) Validate availability and basic health via CLI

REM Ensure Foundry Local is running a model
foundry model list
foundry model run phi-4-mini

REM Validate the OpenAI-compatible endpoint
curl http://localhost:8000/v1/models

Step 3) Redaction and PII hygiene

  • Before sending messages to the model, strip or hash sensitive fields (emails, phone numbers, IDs)
  • Keep raw source data on-device, only pass necessary context strings

Example redaction helper:

# infra/redact.py
import re
EMAIL_RE = re.compile(r"[\w\.-]+@[\w\.-]+")
PHONE_RE = re.compile(r"\+?\d[\d\s\-]{7,}\d")

def sanitize(text: str) -> str:
    text = EMAIL_RE.sub("[REDACTED_EMAIL]", text)
    text = PHONE_RE.sub("[REDACTED_PHONE]", text)
    return text

Use in agents:

from infra.redact import sanitize
# user_goal = sanitize(user_goal)
# context = sanitize(context)

Step 4) Circuit breakers and error handling

  • Wrap each agent call with try/except and exponential backoff
  • Short-circuit the pipeline on repeated failures
import time

def with_retry(func, retries=3, base_delay=0.5):
    for i in range(retries):
        try:
            return func()
        except Exception as e:
            if i == retries - 1:
                raise
            time.sleep(base_delay * (2 ** i))

Step 5) Local audit trail and export

  • Store JSON logs under ./agent_logs
  • Periodically compress and rotate logs
  • Export summaries for reviews (counts, avg latency, error rates)

Step 6) Cross-check with Microsoft Learn docs

  • Foundry Local serves an OpenAI-compatible API (validated with curl /v1/models)
  • Use foundry model run <name> to confirm model availability
  • Follow official guidance for client integration and sample apps (Open WebUI/how-tos)

References

Next Steps

  • Explore Azure AI Agents for cloud-hosted orchestration
  • Add enterprise connectors (Microsoft Graph, Search, databases)