Python + AI Weekly Office Hours: Recordings & Resources #280

pamelafox · 2026-01-07T01:13:50Z

pamelafox
Jan 7, 2026

Each week, we hold weekly office hours about all things Python + AI in the Foundry Discord.
Join the Discord here: http://aka.ms/aipython/oh

This thread will list the recordings of each office hours, and any other resources that come out of the OH sessions. The questions and answers are automatically posted (based on the transcript) as comments in this thread.

May 19, 2026

Recording

Topics covered:

May 12, 2026

Recording

Topics covered:

April 28, 2026

Recording

Topics covered:

Update: Do Foundry evaluations stay in your tenant?
How could we use GraphRAG from Cosmos DB in a hosted agent for memory and knowledge?
Which model is best for RAG-based chatbots?
How come I can't deploy the Mistral OCR model anymore?
I'm getting 408 timeouts when asking the model to query multiple tools at once — is it a prompt issue or a model issue?
Any inputs on PageIndex vs. vector RAG?
Announcements

April 20, 2026

Recording

Topics covered:

What's a good workflow for pulling entities out of PDFs?
Bug report: Sporadic 400 errors from the Azure AI Search vectorization endpoint
Bug report: Authentication succeeds but tool calls fail with the Foundry Atlassian MCP server
Any tips for the Vancouver Web Summit hackathon?
Announcements

April 13, 2026

Recording

Topics covered:

What is WorkIQ MCP and what does it work best for?
Would autoresearch make sense for automatic RAG parameter tuning?
Is deploying OpenClaw safe? What about privacy concerns?
How do I publish a Foundry hosted agent as an M365 Copilot agent?
Are there good resources to dig deeper on AI agents deployment?
Announcements

April 7, 2026

Recording

Topics covered:

How did you generate the news slides using Copilot CLI and MCP servers?
When working with a multi-tool calling workflow, is generating a skill to guide the agent the right design?
What's new with Agent Framework 1.0.0?
GitHub Copilot now works on Dependabot PRs
New in VS Code: Copilot Chat Customizations viewer
New in FastMCP 3.0: OpenTelemetry support
Are open models now viable for agent workloads? What about Gemma 4?
How is Copilot CLI accessing the web browser so easily?
What techniques does GitHub Copilot use to search large repos?
Would it be possible to do a Q&A with a VS Code team member?
Announcements

March 31, 2026

Recording

Topics covered:

What is the GA version of agent framework? Can we install it now?
What are some good local models for agents?
What is GEPA's "Optimize Anything"?
Has anyone at Microsoft tested TurboQuant for local models?
Have you used the Work IQ MCP server?
Does Microsoft have anything to check agent skills for security and effectiveness?
Have you seen the "Mirage" problem with multimodal LLMs?
Why are people saying MCP is dead? Is it because of skills?
What are some recommended talks from the Py AI conference?
Ollama now powered by MLX on Apple Silicon?
When can we see agent framework durable agents in an Azure Functions series?
Is private networking supported for Foundry hosted agents?
Any tips to maximize GitHub Copilot Auto output quality with the Codex model?
How to start learning AI? What about the "AI Engineering" book?
Announcements

March 24, 2026

Recording

Topics covered:

What's the difference between the chat completions API and the responses API?
Can VS Code skills be a replacement for MCP servers?
What happened with the LiteLLM supply chain attack?
Is there a way to host agents in AKS and use Foundry orchestrations?
Do you have recommendations for getting started with evals?
Announcement: New agent-framework release candidate (RC5)
Announcement: Deploy agents to Foundry with azd
Announcement: Upcoming live stream series on Foundry Hosted Agents (April 28–30)
Demo: Foundry Agent Service

March 17, 2026

Recording

Topics covered:

Announcement: Foundry Agent Service goes GA
Announcement: What's new with GitHub Copilot
Does OpenAI/Azure OpenAI see your data?
Can you talk about Claude and Claude Code?
How do web search and code interpreter work with the Responses API?
How important is system design for multi-agent systems?
What is your opinion about RL (reinforcement learning) on the agent layer?
What's your opinion on SaaS providers offering agents instead of direct data access?
What are your go-to resources for system design interviews?
Can auto-research be used to improve MCP servers or agents?
Upcoming events (Online and in-person)

February 17, 2026

Recording

Topics covered:

Announcements
Claude Sonnet 4.6 Availability
Any update on Anthropic models available over Azure for CSP clients?
Can students or free accounts access Foundry models?
What is WebMCP and how does it work?
What are good use cases for Microsoft Agent Framework?
What are GitHub Agentic Workflows?
How are agent skills different from system prompts? What should go in AGENTS.md?
When should you build a custom agent vs. use lighter weight tools?
Are Foundry Hosted Agents running on Microsoft-managed infrastructure expected?
Is anyone actually solving reranking for mixed-media RAG?
How do you get an agent built with Agent Framework to use online evals in Azure Foundry?

February 10, 2026

Recording

Topics covered:

February 3, 2026

Recording

Topics covered:

What security concerns exist around OpenClaw and Moltbook?
What is the new Codex app and how does it compare to GitHub Copilot?
What are Skills and how do they work in GitHub Copilot?
What's a good workflow for handling PR code reviews?
How can you run multiple agents in parallel?
What is Pamela working on with MCP tool schemas?
Is it safe to use an agent for LinkedIn job searching?

January 27, 2026

Recording

Topics covered:

What is MCP Apps support in VS Code?
What's new in VS Code Insiders?
What is the GitHub Copilot SDK and CLI?
What developer hackathons are coming up?
Is this a good place to ask about Microsoft Foundry SDK or Agent Framework SDK?
Are you using Spec-Driven Development (SDD) or SpecKit to guide coding agents?
What's new with the RAG demo - ACL support?
Have you tried memory tools in GitHub Copilot?
When should I use Foundry IQ knowledge bases vs MCP tools?
What tools do you use to automate developer workflows?
What is Work IQ?

January 20, 2026

Recording

Topics covered:

News & Updates
Upcoming Events
What podcasts would you recommend for learning Python and AI?
How do you tackle feeling lost when trying to implement what you learned from tutorials?
What are NLUs and CLUs, and when should you use them?
How do you handle context switching when users change topics mid-conversation with sub-agents?
Is it possible for someone from Pakistan to get an internship and then full-time role at Microsoft?
As a fresher, what areas in AI should I focus on and how do I build a portfolio?
How do you get experience for AI engineering jobs as a fresher when they all require experience?
What do you think about coding agents like Claude Code, OpenCode, and Antigravity for complex tasks?
Who maintains spec-kit now that Den Delimarsky left Microsoft?

January 13, 2026

Recording

Topics covered:

What advantages do other formats have over .txt for prompts?
What is the future of AI and which specialization should I pursue?
Which livestream series should I follow to build a project using several tools and agents?
How does Azure manage the context window?
How do we deal with context rot and how do we summarize context?
Have you seen or implemented anything related to AG-UI or A2UI?
Can you comment on using a "harness" for long-running agents?
What do you think of Hindsight for agent memory?
What was Pamela working on from Jan 6-Jan 13?

January 6, 2025

Recording

Topics covered:

How do you set up Entra OBO flow for Python MCP servers?
Which MCP inspector should I use for testing servers with Entra authentication?
How do you track LLM usage tokens and costs?
How do you keep yourself updated with all the new changes related to AI?
How do you build a Microsoft Copilot agent in Python?
How do I learn about AI from scratch as a backend developer?
What's new with the RAG demo after SharePoint was added?
Will companies create internal MCP servers?

pamelafox · 2026-01-08T05:44:18Z

pamelafox
Jan 8, 2026
Author

2026/01/06: Do you think companies will create internal MCP servers for AI apps to connect to?

Yes, this is already happening quite a bit. Common use cases include:

Internal documentation servers
Data analytics access for non-developers
Ticketing systems
Debugging tools

A particularly valuable use case is data science/engineering teams creating MCP servers that enable less technical folks (marketing, PMs, bizdev) to pull data safely without needing to write SQL.

The pattern often starts with an engineer building an MCP server for themselves, sharing it with colleagues, adding features based on their needs, and growing from there.

Links shared:

Pragmatic Engineer: Building MCP servers in the real world

0 replies

pamelafox · 2026-01-08T05:58:10Z

pamelafox
Jan 8, 2026
Author

2026/01/06: How do you set up Entra OBO (On-Behalf-Of) flow for Python MCP servers?

📹 5:48

The demo showed how to use the Graph API with the OBO flow to find out the groups of a signed-in user and use that to decide whether to allow access to a particular tool.

The flow works as follows:

Get the access token from the middleware
Exchange that access token for a Graph API token using the OBO flow with a specific scope
Use the Graph token to call graph.microsoft.com/v1/me/memberOf to check group membership
Filter by the specific group ID you want to check (more efficient than getting all groups and paginating)
If the count returned is 1, the user is in the group; if 0, they're not

For the authentication dance, FastMCP handles the DCR (Dynamic Client Registration) flow since Entra itself doesn't support DCR natively.

To test from scratch:

Go to "Authentication: Remove Dynamic Authentication Providers" in VS Code
Clear the localhost authentication
Start the server
When you start, VS Code detects the 401, attempts DCR flow, gets the PRM, and sees the authorization server supports DCR
Allow access on the FastMCP consent screen
It briefly jumps to login.microsoftonline.com before returning

Links shared:

PR to python-mcp-demos to add Entra OBO

0 replies

pamelafox · 2026-01-08T05:58:11Z

pamelafox
Jan 8, 2026
Author

2026/01/06: Which MCP inspector should I use for testing servers with Entra authentication?

📹 20:24

The standard MCP Inspector doesn't work well with Entra authentication because it doesn't do the DCR (Dynamic Client Registration) dance properly.

MCP Jam is recommended instead because it properly handles the OAuth flow with DCR. To set it up:

Install MCP Jam
Add a server over HTTP (e.g., localhost:8000/mcp)
Configure OAuth with your scopes
It will go through the full registration flow

MCP Jam also has nice features like:

Saved requests for replaying the same request repeatedly during development
An OAuth debugger with a diagram showing the whole flow
A chat interface for testing your server with different models

One note: enum values in tools don't yet show as dropdowns in MCP Jam (issue to be filed).

Links shared:

MCPJam Inspector

What's the difference between MCP Jam and LM Studio?

📹 34:19

LM Studio is primarily for playing around with LLMs locally. MCP Jam has some overlap since it includes a chat interface with access to models, but its main purpose is to help you develop MCP servers and apps. It's focused on the development workflow rather than just chatting with models.

0 replies

pamelafox · 2026-01-08T05:58:12Z

pamelafox
Jan 8, 2026
Author

2026/01/06: How do you track LLM usage tokens and costs?

📹 28:04

For basic tracking, Azure portal shows metrics for token usage in your OpenAI accounts. You can see input tokens and output tokens in the metrics section.

You can also:

Log custom metrics with OpenTelemetry
Use Langfuse
Use LiteLLM (mentioned by a community member)

If you use multiple providers, you need a way to consolidate the tracking. OpenTelemetry metrics could work but you'd need a way to hook into each system.

0 replies

pamelafox · 2026-01-08T05:58:13Z

pamelafox
Jan 8, 2026
Author

2026/01/06: How do you keep yourself updated with all the new changes related to AI?

📹 30:32

Several sources recommended:

Company chat channels (e.g., generative AI chat, GitHub Copilot chat) for sharing what people are experimenting with
Newsletters from LangChain, Pydantic AI, etc.
LinkedIn, Hacker News
Specific bloggers

Particularly recommended:

Elite AI Assisted Coding newsletter - Great for agentic coding tips, run by Isaac and Eleanor who experiment with everything
Drew Breunig's blog - A developer who writes thoughtful pieces about LLMs

Links shared:

How I learn about generative AI (blog post)

0 replies

pamelafox · 2026-01-08T05:58:14Z

pamelafox
Jan 8, 2026
Author

2026/01/06: How do you build a Microsoft Copilot agent in Python with custom API calls?

📹 36:30

For building agents that work with Microsoft 365 Copilot (which appears in Windows Copilot and other Microsoft surfaces):

Use the Agent Framework - it has a demo for M365 integration
Test locally using the agent playground
Deploy to Microsoft 365 using the deployment docs

The agent framework team is responsive if there are issues.

Links shared:

0 replies

pamelafox · 2026-01-08T05:58:16Z

pamelafox
Jan 8, 2026
Author

2026/01/06: As a backend developer with a non-CS background, how do I learn about AI from scratch?

📹 46:39

Recommended approach:

Watch the Python + AI series from October (if you understand Python, it's at a good level)
Read the AI Engineering book by Chip Huyen
Build stuff - go back and forth between learning and doing

Links shared:

0 replies

pamelafox · 2026-01-08T05:58:17Z

pamelafox
Jan 8, 2026
Author

2026/01/06: What's new with the RAG demo (azure-search-openai-demo) after the SharePoint data source was added?

📹 49:50

The main work is around improving ACL (Access Control List) support. The cloud ingestion feature was added recently, but it doesn't yet support ACLs. The team is working on making ACLs compatible with all features including:

Cloud ingestion
SharePoint Online document libraries
ADLS (Azure Data Lake Storage Gen2)

A future feature idea: adding an MCP server to the RAG repo for internal documentation use cases, leveraging the Entra OBO flow for access control.

0 replies

pamelafox · 2026-01-08T05:58:18Z

pamelafox
Jan 8, 2026
Author

2026/01/06: Do you think companies will create internal MCP servers for AI apps to connect to?

📹 53:53

Yes, this is already happening quite a bit. Common use cases include:

Internal documentation servers
Data analytics access for non-developers
Ticketing systems
Debugging tools

A particularly valuable use case is data science/engineering teams creating MCP servers that enable less technical folks (marketing, PMs, bizdev) to pull data safely without needing to write SQL.

The pattern often starts with an engineer building an MCP server for themselves, sharing it with colleagues, adding features based on their needs, and growing from there.

Links shared:

Pragmatic Engineer: Building MCP servers in the real world

0 replies

pamelafox · 2026-01-13T23:26:36Z

pamelafox
Jan 13, 2026
Author

2026/01/13: What advantages do other formats have over .txt for prompts? How do you improve prompts with DSPy and evals?

📹 4:55

Prompty is a template format that mixes Jinja and YAML together. The YAML goes at the top for metadata, and the rest is Jinja templating. Jinja is the most common templating system for Python (used by Flask, etc.). The nice thing about Jinja is you can pass in template variables—useful for customization, passing in citations, etc. Prompty turns the file into a Python list of chat messages with roles and contents.

However, we're moving from Prompty to plain Jinja files because:

Prompty doesn't support the Responses API
Prompty hasn't seen much adoption—it's hard to get people to adopt new formats
It's easier to just use text files, markdown files, or something people already know

Recommendation: Keep prompts separate from code when possible, especially long system prompts. Use plain .txt or .md if you don't need variables, or Jinja if you want to render variables. With agents and tools, some LLM-facing text (like tool descriptions in docstrings) will inevitably live in your code—that's fine.

For iterating on prompts: Run evaluations, change the prompt, and see whether it improves things. There are tools like DSPy and Agent Framework's Lightning that do automated prompt optimization/fine-tuning. Lightning says it "fine-tunes agents" but may actually be doing prompt changes. Most of the time, prompt changes don't make a huge difference, but sometimes they might.

Links shared:

0 replies

pamelafox · 2026-01-13T23:26:37Z

pamelafox
Jan 13, 2026
Author

2026/01/13: What is the future of AI and which specialization should I pursue?

📹 11:54

If you enjoy software engineering and full-stack engineering, it's more about understanding the models so you understand why they do what they do, but it's really about how you're building on top of those models. There's lots of interesting stuff to learn, and it really depends on you and what you're most interested in doing.

0 replies

pamelafox · 2026-01-13T23:26:38Z

pamelafox
Jan 13, 2026
Author

2026/01/13: Which livestream series should I follow to build a project using several tools and agents, and should I use a framework?

📹 13:33

Everyone should understand tool calling before moving on to agents. From the original 9-part Python + AI series, start with tool calling, then watch the high-level agents overview. The upcoming six-part series in February will dive deeper into each topic, especially how to use Agent Framework.

At the bare minimum, you should understand LLMs, tool calling, and agents. Then you can decide whether to do everything with just tool calling (you can do it yourself with an LLM that has tool calling) or use an agent framework like LangChain or Agent Framework if you think it has enough benefits for you.

It's important to understand that agents are based on tool calling—it's the foundation of agents. The success and failure of agents has to do with the ability of LLMs to use tool calling.

Links shared:

0 replies

pamelafox · 2026-01-13T23:26:39Z

pamelafox
Jan 13, 2026
Author

2026/01/13: How does Azure manage the context window? How do I maintain a long conversation with a small context window?

📹 15:21

There are three general approaches:

Send the last N messages - This is the most naive approach, but you don't actually know if they're going to fit.
Send only messages that fit - Pre-count the tokens and only send messages that fit inside your remaining tokens. This is hard to do correctly as you're basically reverse-engineering the models to figure out how to calculate tokens. There's a library for this if you want to attempt it.
Summarize the conversation - When the conversation gets too long, make a call to summarize it. You can either wait for an error from the LLM that says the context is too long and then summarize, or proactively summarize before hitting the limit.

With today's large context windows (128K, 256K), it's often easier to just wait for an error and tell the user to start a new chat, or do summarization when the error occurs. This approach is most likely to work across models since every model should throw an error when you're over the context window.

Links shared:

0 replies

pamelafox · 2026-01-13T23:26:39Z

pamelafox
Jan 13, 2026
Author

2026/01/13: How do we deal with context rot and how do we summarize context using progressive disclosure techniques?

📹 19:17

Read through Kelly Hong's (Chroma researcher) blog post on context rot. The key point is that even with a 1 million token context window, you don't have uniform performance across that context window. She does various tests to see when performance starts getting worse, including tests on ambiguity, distractors, and implications.

A general tip for coding agents with long-running tasks: use a main agent that breaks the task into subtasks and spawns sub-agents for each one, where each sub-agent has its own focused context. This is the approach used by the LangChain Deep Agents repo.

You can also look at how different projects implement summarization. LangChain's summarization middleware is open source—you can see their summary prompt and approach. They do approximate token counting and trigger summarization when 80% of the context is reached.

Links shared:

How do I deal with context issues when using the Foundry SDK with a single agent?

📹 25:03

If you're using the Foundry SDK with a single agent (hosted agent), you can implement something like middleware through hooks or events. Another approach is the LangChain Deep Agents pattern: implement sub-agents as tools where each tool has a limited context and reports back a summary of its results to the main agent.

For the summarization approach with Foundry agents, you'd need to figure out what events, hooks, or middleware systems they have available.

0 replies

pamelafox · 2026-01-13T23:26:40Z

pamelafox
Jan 13, 2026
Author

2026/01/13: Have you seen or implemented anything related to AG-UI or A2UI?

📹 29:02

AG-UI (Agent User Interaction Protocol) is an open standard introduced by the CopilotKit team that standardizes how front-end applications communicate with AI agents. Both Pydantic AI and Microsoft Agent Framework have support for AG-UI—they provide adapters to convert messages to the AG-UI format.

The advantage of standardization is that if people agree on a protocol between backend and frontend, it means you can build reusable front-end components that understand how to use that backend.

Agent Framework also supports different UI event stream protocols, including Vercel AI (though Vercel is a competitor, so support may be limited). These are adapters—you can always adapt output into another format if needed, but it's nice when it's built in.

A2UI is created by Google with Consortium CopilotKit and relates to A2A (Agent-to-Agent). A2UI appears to be newer with less support currently in Agent Framework, though A2A is supported.

Links shared:

0 replies

kinthaiofficial · 2026-04-29T00:10:34Z

kinthaiofficial
Apr 29, 2026

For teams moving from single-agent Python prototypes to multi-agent production, the Python patterns that seem natural become anti-patterns at scale. A few common issues:

Sharing OpenAI/Anthropic clients across agents — fine in prototypes, breaks in production. Each agent should have its own client (or at least its own rate limit tracking) because a single starved agent can exhaust the shared client's rate limits, silently degrading all other agents.

Using Python asyncio for agent concurrency — the event loop becomes a bottleneck. Agent LLM calls are I/O-bound but the response processing (context assembly, memory updates) can be CPU-bound. Better: use separate worker processes with message queues between them.

Context = conversation history — the natural pattern in Python AI code is to append every turn to a list and pass it as messages. At 100+ turns, this becomes a context explosion. You need progressive compaction: summarize older turns, keep recent ones verbatim.

No budget tracking = production accidents — Python makes it easy to await client.messages.create(...) without tracking what you've spent. In a multi-agent system, one agent's runaway loop can cost hundreds of dollars before you notice. Budget tracking needs to be at the infrastructure layer, not left to each agent's application code.

Naive retry logic amplifies costs — @retry(attempts=3) on a token-heavy LLM call means a single failed request can triple your cost on that turn. Retries should be smarter: exponential backoff, check if the error is retriable before retrying, use a simpler model on retry rather than the same expensive one.

These patterns from production: https://blog.kinthai.ai/agent-wallet-economic-models-autonomous-agents

0 replies

pamelafox · 2026-05-20T19:42:24Z

pamelafox
May 20, 2026
Author

2026/05/19: Which TTS model do you recommend using locally (offline)?

📹 22:32

Pamela hasn't used local TTS models much herself, but noted that most people recommend Whisper or using Whisper inside some kind of harness for offline speech. She mentioned that she just got a computer with 64 GB of RAM, so she's hoping to start experimenting more with local models. She also checked CanIRun.ai but it doesn't seem to list TTS models. She encouraged others in the chat to share their recommendations for offline TTS.

0 replies

pamelafox · 2026-05-20T19:42:25Z

pamelafox
May 20, 2026
Author

2026/05/19: How can I experiment with generative UI / A2UI?

📹 24:08

For generative UI, there are a few options depending on what you're building:

FastMCP already supports generative UI, where an MCP server can respond to requests with UI components on the fly (constrained to prefab components for safety). This is a good starting point if you're building MCP servers that output apps.
AG-UI is the protocol that Agent Framework (MAF) integrates with for front-end/back-end agent communication.
A2UI (from Google) is a declarative, LLM-friendly generative UI spec. Support for A2UI in Agent Framework Python is currently in progress — an engineer on the MAF team confirmed this in a discussion thread. No PR has been pushed yet, so subscribing to that discussion is recommended.
CopilotKit is the open-source team behind the AG-UI protocol. There's a DeepLearning.ai course on building interactive agents with generative UI using CopilotKit that could be a good learning resource.

Links shared:

0 replies

pamelafox · 2026-05-20T19:42:26Z

pamelafox
May 20, 2026
Author

2026/05/19: What is Pydantic Monty and how can it be used with agents?

📹 29:46

Pydantic Monty is a sandboxed Python runtime (re-implemented in Rust) that lets you run a subset of Python safely. There are two main use cases for agents:

1. Code mode (reducing tool-calling overhead): Instead of the traditional back-and-forth of many individual tool calls, the LLM writes Python code that calls tools as if they were functions. This code can do sequences, for loops, async gathers, etc. — all in one shot. This dramatically reduces tokens and round trips. If you're using Pydantic AI, it's one line of code to add Monty as a code mode backend. For Agent Framework, there's a PR in review that adds Monty as a CodeAct provider.

2. Code execution tool: You can add a code execution tool to your agent that uses Monty for fast, safe computation. For example, giving an agent the ability to do date/time math or calculations without relying on the LLM's (often unreliable) arithmetic. You could even have two code execution tools: a fast Monty tool for datetime/math, and a slower hosted code interpreter (with access to pandas, numpy, etc.) for more complex tasks.

Samuel Colvin (Pydantic's creator) gave an extended talk about Monty at PyCon US, explaining how they re-implemented Python in Rust. There's also a "Hack Monty" competition with a $10,000 bounty to find security issues, which helps validate the sandbox's safety.

Links shared:

0 replies

pamelafox · 2026-05-20T19:42:27Z

pamelafox
May 20, 2026
Author

2026/05/19: Can I use GitHub Copilot (VS Code) with a model deployed in Foundry?

📹 38:22

Yes! You can add Foundry models to VS Code via Manage Language Models (in the model picker dropdown). This creates a chatLanguageModels.json file in your user settings directory. The main documentation covers the Azure vendor option, which works for OpenAI models. For non-OpenAI models (like Claude), you need VS Code Insiders and the custom endpoint vendor.

During the session, Pamela had trouble getting the "Azure" provider option to work. After live-debugging with attendees and reading the VS Code source code, she discovered the Azure provider only recognizes URLs containing models.ai.azure.com or openai.azure.com. This means it works for Azure OpenAI models, but not for non-OpenAI models deployed on Foundry (which use services.ai.azure.com). This is what she got working after the livestream:

Option 1: Azure vendor (for Azure OpenAI models only)

For OpenAI models on Foundry, you can use "vendor": "azure" with either key-based or Entra authentication. The URL should be just the base endpoint (without path suffixes):

{
    "name": "Work Foundry",
    "vendor": "azure",
    "models": [
        {
            "id": "gpt-5.2",
            "name": "gpt-5.2-work",
            "url": "https://YOUR-ACCOUNT.openai.azure.com",
            "toolCalling": true,
            "vision": true,
            "maxInputTokens": 128000,
            "maxOutputTokens": 16000
        }
    ]
}

Option 2: Custom endpoint vendor (for ANY model, including non-OpenAI — requires API key)

For non-OpenAI models (like Claude) or if you want a single configuration for all models, use "vendor": "customendpoint" instead, which works with any URL. Note that this option requires an API key and does not support keyless/Entra authentication:

{
    "name": "Personal Foundry",
    "vendor": "customendpoint",
    "apiKey": "${input:chat.lm.secret.YOUR_SECRET_ID}",
    "apiType": "responses",
    "models": [
        {
            "id": "gpt-5.4",
            "name": "gpt-5.4-mine",
            "url": "https://YOUR-FOUNDRY.openai.azure.com/openai/v1/responses",
            "apiType": "responses",
            "toolCalling": true,
            "vision": true,
            "maxInputTokens": 128000,
            "maxOutputTokens": 16000
        },
        {
            "id": "claude-sonnet-4-6-DEPLOYMENT",
            "name": "claude-sonnet-mine",
            "url": "https://YOUR-FOUNDRY.services.ai.azure.com/anthropic/v1/messages",
            "apiType": "messages",
            "toolCalling": true,
            "vision": true,
            "maxInputTokens": 128000,
            "maxOutputTokens": 16000
        }
    ]
}

Key details:

Use "apiType": "responses" for OpenAI models and "apiType": "messages" for Anthropic models
The id should match your deployment name in Foundry
The URL format differs: openai.azure.com/openai/v1/responses for OpenAI models, services.ai.azure.com/anthropic/v1/messages for Anthropic models

Links shared:

0 replies

pamelafox · 2026-05-20T19:42:29Z

pamelafox
May 20, 2026
Author

2026/05/19: Announcements

📹 0:00

PyCon US recap:

Pamela ran an MCP tutorial covering elicitations and MCP apps (newer parts of the protocol), with self-paced exercises for learning to build MCP servers in Python.
She gave a talk at the EduSummit about making slides using Reveal.js and GitHub Copilot, with tips on using VS Code's integrated browser for rapid front-end iteration. The related presentation-skills repo contains skills for AI agents to process presentations.

GitHub Copilot App (desktop technical preview):

📹 4:57

The new GitHub Copilot App is a desktop app wrapping the Copilot CLI, available for Enterprise and Business subscribers. Key features demonstrated:

Better UI for viewing sessions, repos, inline diffs, and PRs
MCP server and skills integration (same as CLI)
Agent merge for automated CI fixing
Workflows: scheduled personal automations (daily issue triage, weekly recaps, etc.) with a gallery of templates
Web search tool built in
Repo-less chats for general questions

New MAI models in Foundry:

📹 11:46

Three new MAI models are available:

MAI-Image-2E: An image generation model that excels at facial likeness preservation and photo-realistic output. Pamela demonstrated using it at the PyCon booth to generate images from source photos. The model card details evaluation categories, red teaming, and training data.
MAI-Transcribe-1: For transcription
MAI-Voice-1: For voice generation

Links shared:

0 replies

pamelafox · 2026-05-20T20:21:56Z

pamelafox
May 20, 2026
Author

2026/05/12: Where could I start learning about Python and AI?

📹 0:31

Pamela recommended her blog post about how she personally learns about generative AI, which links to the Python + AI video series at the bottom. The most important thing is to actually try things hands-on — that's the way you really learn. The Python + AI video series is designed for someone who knows some Python but doesn't yet know generative AI. All the videos are free to watch on YouTube.

Links shared:

0 replies

pamelafox · 2026-05-20T20:21:57Z

pamelafox
May 20, 2026
Author

2026/05/12: Do you have links for free Azure credits to use Foundry?

📹 1:48

Pamela doesn't have coupons to give out. The Azure free trial exists, but it likely doesn't work for Foundry model usage. Foundry's pricing page says it's "free to use and explore," but that likely just means you can browse the leaderboard and poke around — as soon as you start using models, you need to pay. That said, if you're just doing a few small tests, it's not very expensive — Pamela got a bill of about a dollar for her tests with Anthropic models building an agent. You really only need a proper payment setup if you're developing an app or doing evaluations that require many calls.

0 replies

pamelafox · 2026-05-20T20:21:58Z

pamelafox
May 20, 2026
Author

2026/05/12: How did the Code with Claude workshop use Foundry?

📹 4:13

Pamela attended the Code with Claude conference and ran a workshop using Claude models (Sonnet, Haiku, and Opus) deployed from Foundry. They used Microsoft Agent Framework pointed at an Anthropic model deployment to build a cupcake store ordering agent. The workshop repo shows how to set up the deployment name, API key, and endpoint.

You can also watch a video of Pamela walking through the agent.

Links shared:

Code with Claude Foundry workshop

0 replies

pamelafox · 2026-05-20T20:21:59Z

pamelafox
May 20, 2026
Author

2026/05/12: With the Copilot pricing changes and 9x multiplier, do you have any recommendations on getting around this?

📹 7:07

GitHub Copilot is moving to usage-based billing starting June 1st, based on token usage. Pamela shared several free alternatives for prototyping:

NVIDIA NIM — A developer she met said you get free access to NIM API endpoints for "unlimited prototyping" (though there are likely limits like slower speeds)
GitHub Models — Free but hasn't been updated with the most recent models and doesn't support the responses API
Ollama — Run models locally for free if your machine is powerful enough. Pamela currently runs Qwen 3.5 and Gemma 4 locally on her new Mac M5 Max
CanIRun.ai — A website that predicts which models you can run on your machine based on your hardware

For local models, Pamela noted that QPTD OS is good for agentic stuff, and she planned to download it for her new machine.

Links shared:

0 replies

pamelafox · 2026-05-20T20:22:01Z

pamelafox
May 20, 2026
Author

2026/05/12: Can you give advice to AI student graduates on what skills to pick up to get interviews lined up?

📹 14:03

For software engineering roles, Pamela recommended:

Know how to use agentic coding tools — GitHub Copilot, Claude Code, or Codex. Companies increasingly expect developers to use these tools, and they help you write code faster even if you just use them as supercharged autocomplete.
Know how to incorporate an LLM into things you're building — Think of it like knowing regular expressions. Sometimes you need a regex, and now sometimes you need an LLM. LLMs are replacing regex in many places where you previously did fuzzy matching. You should be able to reach for an LLM as a tool and know how to write different kinds of tests for non-deterministic behavior.
Standard interview prep — Look at job descriptions for specific skills they want (distributed systems, API design, etc.), use Glassdoor for interview questions, and prepare for algorithmic/data structures questions.

The best approach is to find a job posting you're interested in, identify the skills it requires, and work on those specific skills.

Can we be picky on which companies we want when we have no real experience?

📹 19:00

If you have no real experience, you typically start at an internship or junior level. You shouldn't work at a job you hate, but be open to working at lots of different places. Don't necessarily aim for the hardest-to-get-into companies (like OpenAI or Anthropic) right away — there are tons of companies out there that can be really interesting.

0 replies

pamelafox · 2026-05-20T20:22:02Z

pamelafox
May 20, 2026
Author

2026/05/12: What's the best approach for adding knowledge to agents — Karpathy Wiki, Vector RAG, or Knowledge Graphs?

📹 20:15

Pablo asked about the optimal retrieval approach for expert agents that need completeness, correctness, and pattern detection. Pamela agreed that different retrieval mechanisms work better for different types of queries and sources. Key points:

Multi-source RAG with specialized retrieval layers: Azure AI Search supports multi-source RAG where each source implements its own retrieval mechanism (e.g., Fabric ontology uses graph-like queries, Work IQ has its own retrieval approach). You can send queries in parallel to different sources using customized retrieval, then merge and rank results using reciprocal rank fusion or semantic ranking models.

Completeness is the hardest problem: The only way to truly know you've done a complete search is to look at everything. Azure AI Search has a semantic classification model that checks whether retrieved documents fully answer the query, and if not, performs another search. You can implement similar logic in your agent — break the question down into information needs, check which are covered by results, and keep searching if needs remain unmet.

Graph RAG for completeness: Graph RAG is good for completeness if the ontology is thorough, since you can keep following nodes until all are traversed. However, it's expensive, so you'd want a routing layer that identifies "deep queries" requiring comprehensive search versus queries that basic search can handle. You could implement this as an agent with tools for "basic search" and "deep search" with descriptions guiding when to use each.

Related approaches:

Karpathy's LLM Wiki — Ingest sources into summaries, then search via an index file. Works at moderate scale (hundreds of pages)
TypeAgent-py — Structured RAG that can ingest and extract entities and relations
Microsoft GraphRAG — The main Microsoft graph RAG project
LazyGraphRAG — A research paper about cheaper graph RAG (code not yet released)
Hindsight — Agent memory package that uses semantic search, keyword search, graph traversal, and time series in parallel, then merges with a cross-encoder reranking model
Ralph Loop — A persistent AI agent loop that keeps working until complete, useful for experimentation. Codex has a similar feature with /goal.

Links shared:

0 replies

pamelafox · 2026-05-20T20:22:03Z

pamelafox
May 20, 2026
Author

2026/05/12: Can you explain what different IQ use cases like WorkIQ, FabricIQ, and FoundryIQ are?

📹 43:13

Foundry IQ is the higher-level search mechanism. It can both create and search its own indexes, and connect to different sources (including any MCP endpoint). Think of it as a higher-level search that sends queries to multiple sources in parallel, gets back results, and merges them using reciprocal rank fusion and semantic ranking models.

Fabric IQ has two main components:

Ontologies — A graph representation of your data where you define entity types, relationships, and categories. You can send graph-like queries across the ontology, similar to SQL queries on top of your Fabric data.
Data Agents — Higher-level than ontologies, these incorporate their own LLM to try to answer questions directly about your data.

Both need to be enabled at the Fabric tenant level by an admin.

Work IQ is an agent on top of your M365 read-only data (Teams chats, emails, calendar events). It's available as a CLI and an MCP server. It can query but not create/edit/delete. It's easier to integrate from a permissions perspective for querying M365. You can use it directly from the command line or point any MCP-supporting tool at its MCP server.

Links shared:

0 replies

pamelafox · 2026-05-20T20:22:04Z

pamelafox
May 20, 2026
Author

2026/05/12: Does Microsoft Agent Framework support a combination of checkpoints and session per agent in a handoff workflow?

📹 47:54

Pamela deferred this to the agent framework team since she wasn't sure of the answer. She pointed to the banking assistant sample by David Imo, which uses both checkpoint storage and thread IDs — it likely has a combination of session and checkpointing. David has done the most experimentation with that combination of features.

For further help, Pamela recommended posting in the agent-framework discussions tab, as the team also runs their own office hours.

Links shared:

0 replies

pamelafox · 2026-05-20T20:22:05Z

pamelafox
May 20, 2026
Author

2026/05/12: Announcements

📹 0:00

Claude models on Foundry: Claude models (Sonnet, Haiku, Opus) are now available through Microsoft Foundry. See aka.ms/claude/start.

GPT-5.5 in GitHub Copilot: GPT-5.5 is now generally available for GitHub Copilot.

GitHub Copilot pricing changes: Sign-ups are paused and rate limits are now visible. Usage-based billing starts June 1st.

Microsoft Agent Framework + Anthropic client: The agent framework now supports Anthropic models.

OpenAI winding down fine-tuning: OpenAI is deprecating the self-serve fine-tuning API. Pamela was checking whether this also affects Azure OpenAI fine-tuning.

PyCon US this week: Pamela is running a tutorial on building MCP servers and presenting at the EduSummit on AI-powered slides.

Build 2026: A lab session on Foundry IQ is in development: Build26-LAB532.

New Mac M5 Max: Pamela got a new Mac and is exploring local SLMs on it.

0 replies

Microsoft Foundry

Python + AI Weekly Office Hours: Recordings & Resources #280

Uh oh!

Uh oh!

pamelafox Jan 7, 2026

May 19, 2026

May 12, 2026

April 28, 2026

April 20, 2026

April 13, 2026

April 7, 2026

March 31, 2026

March 24, 2026

March 17, 2026

February 17, 2026

February 10, 2026

February 3, 2026

January 27, 2026

January 20, 2026

January 13, 2026

January 6, 2025

Replies: 147 comments · 3 replies

Uh oh!

pamelafox Jan 8, 2026 Author

Uh oh!

pamelafox Jan 8, 2026 Author

Uh oh!

pamelafox Jan 8, 2026 Author

Uh oh!

pamelafox Jan 8, 2026 Author

Uh oh!

pamelafox Jan 8, 2026 Author

Uh oh!

pamelafox Jan 8, 2026 Author

Uh oh!

pamelafox Jan 8, 2026 Author

Uh oh!

pamelafox Jan 8, 2026 Author

Uh oh!

pamelafox Jan 8, 2026 Author

Uh oh!

pamelafox Jan 13, 2026 Author

Uh oh!

pamelafox Jan 13, 2026 Author

Uh oh!

pamelafox Jan 13, 2026 Author

Uh oh!

pamelafox Jan 13, 2026 Author

Uh oh!

pamelafox Jan 13, 2026 Author

Uh oh!

pamelafox Jan 13, 2026 Author

Uh oh!

kinthaiofficial Apr 29, 2026

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

Uh oh!

pamelafox May 20, 2026 Author

pamelafox
Jan 7, 2026

Replies: 147 comments 3 replies

pamelafox
Jan 8, 2026
Author

pamelafox
Jan 8, 2026
Author

pamelafox
Jan 8, 2026
Author

pamelafox
Jan 8, 2026
Author

pamelafox
Jan 8, 2026
Author

pamelafox
Jan 8, 2026
Author

pamelafox
Jan 8, 2026
Author

pamelafox
Jan 8, 2026
Author

pamelafox
Jan 8, 2026
Author

pamelafox
Jan 13, 2026
Author

pamelafox
Jan 13, 2026
Author

pamelafox
Jan 13, 2026
Author

pamelafox
Jan 13, 2026
Author

pamelafox
Jan 13, 2026
Author

pamelafox
Jan 13, 2026
Author

kinthaiofficial
Apr 29, 2026

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author

pamelafox
May 20, 2026
Author