Python + AI Weekly Office Hours: Recordings & Resources #280
Replies: 147 comments 3 replies
-
|
2026/01/06: Do you think companies will create internal MCP servers for AI apps to connect to? Yes, this is already happening quite a bit. Common use cases include:
A particularly valuable use case is data science/engineering teams creating MCP servers that enable less technical folks (marketing, PMs, bizdev) to pull data safely without needing to write SQL. The pattern often starts with an engineer building an MCP server for themselves, sharing it with colleagues, adding features based on their needs, and growing from there. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you set up Entra OBO (On-Behalf-Of) flow for Python MCP servers? 📹 5:48 The demo showed how to use the Graph API with the OBO flow to find out the groups of a signed-in user and use that to decide whether to allow access to a particular tool. The flow works as follows:
For the authentication dance, FastMCP handles the DCR (Dynamic Client Registration) flow since Entra itself doesn't support DCR natively. To test from scratch:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: Which MCP inspector should I use for testing servers with Entra authentication? 📹 20:24 The standard MCP Inspector doesn't work well with Entra authentication because it doesn't do the DCR (Dynamic Client Registration) dance properly. MCP Jam is recommended instead because it properly handles the OAuth flow with DCR. To set it up:
MCP Jam also has nice features like:
One note: enum values in tools don't yet show as dropdowns in MCP Jam (issue to be filed). Links shared: What's the difference between MCP Jam and LM Studio? 📹 34:19 LM Studio is primarily for playing around with LLMs locally. MCP Jam has some overlap since it includes a chat interface with access to models, but its main purpose is to help you develop MCP servers and apps. It's focused on the development workflow rather than just chatting with models. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you track LLM usage tokens and costs? 📹 28:04 For basic tracking, Azure portal shows metrics for token usage in your OpenAI accounts. You can see input tokens and output tokens in the metrics section. You can also:
If you use multiple providers, you need a way to consolidate the tracking. OpenTelemetry metrics could work but you'd need a way to hook into each system. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you keep yourself updated with all the new changes related to AI? 📹 30:32 Several sources recommended:
Particularly recommended:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: How do you build a Microsoft Copilot agent in Python with custom API calls? 📹 36:30 For building agents that work with Microsoft 365 Copilot (which appears in Windows Copilot and other Microsoft surfaces):
The agent framework team is responsive if there are issues. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: As a backend developer with a non-CS background, how do I learn about AI from scratch? 📹 46:39 Recommended approach:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: What's new with the RAG demo (azure-search-openai-demo) after the SharePoint data source was added? 📹 49:50 The main work is around improving ACL (Access Control List) support. The cloud ingestion feature was added recently, but it doesn't yet support ACLs. The team is working on making ACLs compatible with all features including:
A future feature idea: adding an MCP server to the RAG repo for internal documentation use cases, leveraging the Entra OBO flow for access control. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/06: Do you think companies will create internal MCP servers for AI apps to connect to? 📹 53:53 Yes, this is already happening quite a bit. Common use cases include:
A particularly valuable use case is data science/engineering teams creating MCP servers that enable less technical folks (marketing, PMs, bizdev) to pull data safely without needing to write SQL. The pattern often starts with an engineer building an MCP server for themselves, sharing it with colleagues, adding features based on their needs, and growing from there. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: What advantages do other formats have over .txt for prompts? How do you improve prompts with DSPy and evals? 📹 4:55 Prompty is a template format that mixes Jinja and YAML together. The YAML goes at the top for metadata, and the rest is Jinja templating. Jinja is the most common templating system for Python (used by Flask, etc.). The nice thing about Jinja is you can pass in template variables—useful for customization, passing in citations, etc. Prompty turns the file into a Python list of chat messages with roles and contents. However, we're moving from Prompty to plain Jinja files because:
Recommendation: Keep prompts separate from code when possible, especially long system prompts. Use plain .txt or .md if you don't need variables, or Jinja if you want to render variables. With agents and tools, some LLM-facing text (like tool descriptions in docstrings) will inevitably live in your code—that's fine. For iterating on prompts: Run evaluations, change the prompt, and see whether it improves things. There are tools like DSPy and Agent Framework's Lightning that do automated prompt optimization/fine-tuning. Lightning says it "fine-tunes agents" but may actually be doing prompt changes. Most of the time, prompt changes don't make a huge difference, but sometimes they might. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: What is the future of AI and which specialization should I pursue? 📹 11:54 If you enjoy software engineering and full-stack engineering, it's more about understanding the models so you understand why they do what they do, but it's really about how you're building on top of those models. There's lots of interesting stuff to learn, and it really depends on you and what you're most interested in doing. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: Which livestream series should I follow to build a project using several tools and agents, and should I use a framework? 📹 13:33 Everyone should understand tool calling before moving on to agents. From the original 9-part Python + AI series, start with tool calling, then watch the high-level agents overview. The upcoming six-part series in February will dive deeper into each topic, especially how to use Agent Framework. At the bare minimum, you should understand LLMs, tool calling, and agents. Then you can decide whether to do everything with just tool calling (you can do it yourself with an LLM that has tool calling) or use an agent framework like LangChain or Agent Framework if you think it has enough benefits for you. It's important to understand that agents are based on tool calling—it's the foundation of agents. The success and failure of agents has to do with the ability of LLMs to use tool calling. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: How does Azure manage the context window? How do I maintain a long conversation with a small context window? 📹 15:21 There are three general approaches:
With today's large context windows (128K, 256K), it's often easier to just wait for an error and tell the user to start a new chat, or do summarization when the error occurs. This approach is most likely to work across models since every model should throw an error when you're over the context window. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: How do we deal with context rot and how do we summarize context using progressive disclosure techniques? 📹 19:17 Read through Kelly Hong's (Chroma researcher) blog post on context rot. The key point is that even with a 1 million token context window, you don't have uniform performance across that context window. She does various tests to see when performance starts getting worse, including tests on ambiguity, distractors, and implications. A general tip for coding agents with long-running tasks: use a main agent that breaks the task into subtasks and spawns sub-agents for each one, where each sub-agent has its own focused context. This is the approach used by the LangChain Deep Agents repo. You can also look at how different projects implement summarization. LangChain's summarization middleware is open source—you can see their summary prompt and approach. They do approximate token counting and trigger summarization when 80% of the context is reached. Links shared:
How do I deal with context issues when using the Foundry SDK with a single agent? 📹 25:03 If you're using the Foundry SDK with a single agent (hosted agent), you can implement something like middleware through hooks or events. Another approach is the LangChain Deep Agents pattern: implement sub-agents as tools where each tool has a limited context and reports back a summary of its results to the main agent. For the summarization approach with Foundry agents, you'd need to figure out what events, hooks, or middleware systems they have available. |
Beta Was this translation helpful? Give feedback.
-
|
2026/01/13: Have you seen or implemented anything related to AG-UI or A2UI? 📹 29:02 AG-UI (Agent User Interaction Protocol) is an open standard introduced by the CopilotKit team that standardizes how front-end applications communicate with AI agents. Both Pydantic AI and Microsoft Agent Framework have support for AG-UI—they provide adapters to convert messages to the AG-UI format. The advantage of standardization is that if people agree on a protocol between backend and frontend, it means you can build reusable front-end components that understand how to use that backend. Agent Framework also supports different UI event stream protocols, including Vercel AI (though Vercel is a competitor, so support may be limited). These are adapters—you can always adapt output into another format if needed, but it's nice when it's built in. A2UI is created by Google with Consortium CopilotKit and relates to A2A (Agent-to-Agent). A2UI appears to be newer with less support currently in Agent Framework, though A2A is supported. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
For teams moving from single-agent Python prototypes to multi-agent production, the Python patterns that seem natural become anti-patterns at scale. A few common issues: Sharing OpenAI/Anthropic clients across agents — fine in prototypes, breaks in production. Each agent should have its own client (or at least its own rate limit tracking) because a single starved agent can exhaust the shared client's rate limits, silently degrading all other agents. Using Python asyncio for agent concurrency — the event loop becomes a bottleneck. Agent LLM calls are I/O-bound but the response processing (context assembly, memory updates) can be CPU-bound. Better: use separate worker processes with message queues between them. Context = conversation history — the natural pattern in Python AI code is to append every turn to a list and pass it as messages. At 100+ turns, this becomes a context explosion. You need progressive compaction: summarize older turns, keep recent ones verbatim. No budget tracking = production accidents — Python makes it easy to Naive retry logic amplifies costs — These patterns from production: https://blog.kinthai.ai/agent-wallet-economic-models-autonomous-agents |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/19: Which TTS model do you recommend using locally (offline)? 📹 22:32 Pamela hasn't used local TTS models much herself, but noted that most people recommend Whisper or using Whisper inside some kind of harness for offline speech. She mentioned that she just got a computer with 64 GB of RAM, so she's hoping to start experimenting more with local models. She also checked CanIRun.ai but it doesn't seem to list TTS models. She encouraged others in the chat to share their recommendations for offline TTS. |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/19: How can I experiment with generative UI / A2UI? 📹 24:08 For generative UI, there are a few options depending on what you're building:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/19: What is Pydantic Monty and how can it be used with agents? 📹 29:46 Pydantic Monty is a sandboxed Python runtime (re-implemented in Rust) that lets you run a subset of Python safely. There are two main use cases for agents: 1. Code mode (reducing tool-calling overhead): Instead of the traditional back-and-forth of many individual tool calls, the LLM writes Python code that calls tools as if they were functions. This code can do sequences, for loops, async gathers, etc. — all in one shot. This dramatically reduces tokens and round trips. If you're using Pydantic AI, it's one line of code to add Monty as a code mode backend. For Agent Framework, there's a PR in review that adds Monty as a CodeAct provider. 2. Code execution tool: You can add a code execution tool to your agent that uses Monty for fast, safe computation. For example, giving an agent the ability to do date/time math or calculations without relying on the LLM's (often unreliable) arithmetic. You could even have two code execution tools: a fast Monty tool for datetime/math, and a slower hosted code interpreter (with access to pandas, numpy, etc.) for more complex tasks. Samuel Colvin (Pydantic's creator) gave an extended talk about Monty at PyCon US, explaining how they re-implemented Python in Rust. There's also a "Hack Monty" competition with a $10,000 bounty to find security issues, which helps validate the sandbox's safety. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/19: Can I use GitHub Copilot (VS Code) with a model deployed in Foundry? 📹 38:22 Yes! You can add Foundry models to VS Code via Manage Language Models (in the model picker dropdown). This creates a During the session, Pamela had trouble getting the "Azure" provider option to work. After live-debugging with attendees and reading the VS Code source code, she discovered the Azure provider only recognizes URLs containing Option 1: Azure vendor (for Azure OpenAI models only) For OpenAI models on Foundry, you can use {
"name": "Work Foundry",
"vendor": "azure",
"models": [
{
"id": "gpt-5.2",
"name": "gpt-5.2-work",
"url": "https://YOUR-ACCOUNT.openai.azure.com",
"toolCalling": true,
"vision": true,
"maxInputTokens": 128000,
"maxOutputTokens": 16000
}
]
}Option 2: Custom endpoint vendor (for ANY model, including non-OpenAI — requires API key) For non-OpenAI models (like Claude) or if you want a single configuration for all models, use {
"name": "Personal Foundry",
"vendor": "customendpoint",
"apiKey": "${input:chat.lm.secret.YOUR_SECRET_ID}",
"apiType": "responses",
"models": [
{
"id": "gpt-5.4",
"name": "gpt-5.4-mine",
"url": "https://YOUR-FOUNDRY.openai.azure.com/openai/v1/responses",
"apiType": "responses",
"toolCalling": true,
"vision": true,
"maxInputTokens": 128000,
"maxOutputTokens": 16000
},
{
"id": "claude-sonnet-4-6-DEPLOYMENT",
"name": "claude-sonnet-mine",
"url": "https://YOUR-FOUNDRY.services.ai.azure.com/anthropic/v1/messages",
"apiType": "messages",
"toolCalling": true,
"vision": true,
"maxInputTokens": 128000,
"maxOutputTokens": 16000
}
]
}Key details:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/19: Announcements 📹 0:00 PyCon US recap:
GitHub Copilot App (desktop technical preview): 📹 4:57 The new GitHub Copilot App is a desktop app wrapping the Copilot CLI, available for Enterprise and Business subscribers. Key features demonstrated:
New MAI models in Foundry: 📹 11:46 Three new MAI models are available:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/12: Where could I start learning about Python and AI? 📹 0:31 Pamela recommended her blog post about how she personally learns about generative AI, which links to the Python + AI video series at the bottom. The most important thing is to actually try things hands-on — that's the way you really learn. The Python + AI video series is designed for someone who knows some Python but doesn't yet know generative AI. All the videos are free to watch on YouTube. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/12: Do you have links for free Azure credits to use Foundry? 📹 1:48 Pamela doesn't have coupons to give out. The Azure free trial exists, but it likely doesn't work for Foundry model usage. Foundry's pricing page says it's "free to use and explore," but that likely just means you can browse the leaderboard and poke around — as soon as you start using models, you need to pay. That said, if you're just doing a few small tests, it's not very expensive — Pamela got a bill of about a dollar for her tests with Anthropic models building an agent. You really only need a proper payment setup if you're developing an app or doing evaluations that require many calls. |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/12: How did the Code with Claude workshop use Foundry? 📹 4:13 Pamela attended the Code with Claude conference and ran a workshop using Claude models (Sonnet, Haiku, and Opus) deployed from Foundry. They used Microsoft Agent Framework pointed at an Anthropic model deployment to build a cupcake store ordering agent. The workshop repo shows how to set up the deployment name, API key, and endpoint. You can also watch a video of Pamela walking through the agent. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/12: With the Copilot pricing changes and 9x multiplier, do you have any recommendations on getting around this? 📹 7:07 GitHub Copilot is moving to usage-based billing starting June 1st, based on token usage. Pamela shared several free alternatives for prototyping:
For local models, Pamela noted that QPTD OS is good for agentic stuff, and she planned to download it for her new machine. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/12: Can you give advice to AI student graduates on what skills to pick up to get interviews lined up? 📹 14:03 For software engineering roles, Pamela recommended:
The best approach is to find a job posting you're interested in, identify the skills it requires, and work on those specific skills. Can we be picky on which companies we want when we have no real experience? 📹 19:00 If you have no real experience, you typically start at an internship or junior level. You shouldn't work at a job you hate, but be open to working at lots of different places. Don't necessarily aim for the hardest-to-get-into companies (like OpenAI or Anthropic) right away — there are tons of companies out there that can be really interesting. |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/12: What's the best approach for adding knowledge to agents — Karpathy Wiki, Vector RAG, or Knowledge Graphs? 📹 20:15 Pablo asked about the optimal retrieval approach for expert agents that need completeness, correctness, and pattern detection. Pamela agreed that different retrieval mechanisms work better for different types of queries and sources. Key points: Multi-source RAG with specialized retrieval layers: Azure AI Search supports multi-source RAG where each source implements its own retrieval mechanism (e.g., Fabric ontology uses graph-like queries, Work IQ has its own retrieval approach). You can send queries in parallel to different sources using customized retrieval, then merge and rank results using reciprocal rank fusion or semantic ranking models. Completeness is the hardest problem: The only way to truly know you've done a complete search is to look at everything. Azure AI Search has a semantic classification model that checks whether retrieved documents fully answer the query, and if not, performs another search. You can implement similar logic in your agent — break the question down into information needs, check which are covered by results, and keep searching if needs remain unmet. Graph RAG for completeness: Graph RAG is good for completeness if the ontology is thorough, since you can keep following nodes until all are traversed. However, it's expensive, so you'd want a routing layer that identifies "deep queries" requiring comprehensive search versus queries that basic search can handle. You could implement this as an agent with tools for "basic search" and "deep search" with descriptions guiding when to use each. Related approaches:
Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/12: Can you explain what different IQ use cases like WorkIQ, FabricIQ, and FoundryIQ are? 📹 43:13 Foundry IQ is the higher-level search mechanism. It can both create and search its own indexes, and connect to different sources (including any MCP endpoint). Think of it as a higher-level search that sends queries to multiple sources in parallel, gets back results, and merges them using reciprocal rank fusion and semantic ranking models. Fabric IQ has two main components:
Both need to be enabled at the Fabric tenant level by an admin. Work IQ is an agent on top of your M365 read-only data (Teams chats, emails, calendar events). It's available as a CLI and an MCP server. It can query but not create/edit/delete. It's easier to integrate from a permissions perspective for querying M365. You can use it directly from the command line or point any MCP-supporting tool at its MCP server. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/12: Does Microsoft Agent Framework support a combination of checkpoints and session per agent in a handoff workflow? 📹 47:54 Pamela deferred this to the agent framework team since she wasn't sure of the answer. She pointed to the banking assistant sample by David Imo, which uses both checkpoint storage and thread IDs — it likely has a combination of session and checkpointing. David has done the most experimentation with that combination of features. For further help, Pamela recommended posting in the agent-framework discussions tab, as the team also runs their own office hours. Links shared: |
Beta Was this translation helpful? Give feedback.
-
|
2026/05/12: Announcements 📹 0:00 Claude models on Foundry: Claude models (Sonnet, Haiku, Opus) are now available through Microsoft Foundry. See aka.ms/claude/start. GPT-5.5 in GitHub Copilot: GPT-5.5 is now generally available for GitHub Copilot. GitHub Copilot pricing changes: Sign-ups are paused and rate limits are now visible. Usage-based billing starts June 1st. Microsoft Agent Framework + Anthropic client: The agent framework now supports Anthropic models. OpenAI winding down fine-tuning: OpenAI is deprecating the self-serve fine-tuning API. Pamela was checking whether this also affects Azure OpenAI fine-tuning. PyCon US this week: Pamela is running a tutorial on building MCP servers and presenting at the EduSummit on AI-powered slides. Build 2026: A lab session on Foundry IQ is in development: Build26-LAB532. New Mac M5 Max: Pamela got a new Mac and is exploring local SLMs on it. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Each week, we hold weekly office hours about all things Python + AI in the Foundry Discord.
Join the Discord here: http://aka.ms/aipython/oh
This thread will list the recordings of each office hours, and any other resources that come out of the OH sessions. The questions and answers are automatically posted (based on the transcript) as comments in this thread.
May 19, 2026
Topics covered:
May 12, 2026
Topics covered:
April 28, 2026
Topics covered:
April 20, 2026
Topics covered:
April 13, 2026
Topics covered:
April 7, 2026
Topics covered:
March 31, 2026
Topics covered:
March 24, 2026
Topics covered:
March 17, 2026
Topics covered:
February 17, 2026
Topics covered:
February 10, 2026
Topics covered:
February 3, 2026
Topics covered:
January 27, 2026
Topics covered:
January 20, 2026
Topics covered:
January 13, 2026
Topics covered:
January 6, 2025
Topics covered:
Beta Was this translation helpful? Give feedback.
All reactions