This repository contains code for my live course: O'Reilly Live Online Training - Building AI Agents and Workflows with LangGraph
Instructor: Sinan Ozdemir
This course teaches you how to build production-ready AI agents and workflows using LangGraph. You'll progress from basic graph concepts through tool integration, RAG workflows, middleware for context engineering, evaluation-in-the-loop, and a full capstone deep research agent.
- Intermediate Python
- Familiarity with LangChain basics (prompts, chains, LLMs)
- An OpenRouter API key (required) — gateway to multiple LLM providers
- A SerpAPI key (required, free tier available) — web search
- Optional: LangSmith API key for tracing/observability
| # | Notebook | Topic | Key Concepts |
|---|---|---|---|
| 1 | From Prompts to Workflows | From Prompts to Workflows | Single prompt vs. generate → critique → refine graph |
| 2 | LangGraph Basics | LangGraph Core Primitives | StateGraph, TypedDict, add_messages, conditional edges, checkpoints |
| 3 | Tools and Agents | Tools, ReAct Agents, MCP | @tool, create_agent, manual ReAct, MCP integration with custom server |
| 4 | RAG Workflow | RAG as a LangGraph Workflow | Ingestion pipeline (scrape → chunk → embed), Chroma + HuggingFace embeddings, document grading, web search fallback |
| 5 | Middleware | Context Engineering | SummarizationMiddleware, HumanInTheLoopMiddleware, stacking middleware, LangSmith tracing, custom GuardrailMiddleware |
| 5.1 | Context Summarization Benchmark | Compression Strategies (Advanced) | Rule-following stress test, five compression strategies, custom middleware in middleware.py, LLM-as-judge + heatmaps |
| 6 | Evaluation | Agent Evaluation | LLM-as-judge, trajectory scoring, evaluation-in-the-loop, self-correcting agents, multi-model comparison |
| 6.1 | Structured Grader Evaluation | Grader Benchmark (Advanced) | 0–3 structured rubric, Pydantic with_structured_output, gold-label test set, multi-model grader accuracy, disagreement analysis |
| 7 | Deep Research Agent | Capstone | Planner → researcher → writer → reviewer with revision loop, SummarizationMiddleware + ModelCallLimitMiddleware, supervisor-with-tools pattern |
| 8 | Parallel Graphs | Fan-Out / Fan-In | Send API, dynamic parallel dispatch, operator.add reducer, sequential vs. parallel timing |
Supporting files (run notebooks from the notebooks/ directory):
middleware.py— custom conversation-compression middleware used by notebook 5.1 (MapReduceSummarizationMiddleware,RefineSummarizationMiddleware,RulesFirstSummaryMiddleware, and more)data/bulk_email_conversation.json— scripted ~84-message email-triage conversation for the notebook 5.1 rule-following benchmark- Notebook 6.1 writes
data/structured_grader_evaluation_results.jsonanddata/structured_grader_evaluation_summary.csvwhen you run the save cell
- Notebook 1 uses a "generate → critique → refine" pattern (no tools needed) to motivate why graphs beat single prompts
- Notebook 3 includes a working MCP server (
mcp_server.py) and demonstrates loading external tools vialangchain-mcp-adapters - Notebook 4 scrapes real blog posts from AI Office Hours and builds a LangGraph ingestion pipeline before the RAG query workflow
- Notebook 5 walks through
SummarizationMiddlewareandHumanInTheLoopMiddleware, stacks multiple middleware on one agent, optional LangSmith tracing, and a customGuardrailMiddleware(PII blocking) plus decorator-style hooks; the closing exercise points you atModelCallLimitMiddlewarein the docs - Notebook 5.1 benchmarks five compression strategies on a long scripted conversation — three user rules must survive summarization — using
middleware.pyanddata/bulk_email_conversation.json; ends with LLM-as-judge pass/fail grids and matplotlib heatmaps across model pairings - Notebook 6 compares multiple models against a gold evaluation set using a structured rubric judge
- Notebook 6.1 (advanced) benchmarks nine LLMs as graders on a 12-case gold set with a 0–3 correctness rubric, chain-of-thought structured outputs, accuracy tables, matplotlib charts, and a disagreement walkthrough — only
OPENROUTER_API_KEYrequired - Notebook 7 hardens the researcher sub-agent with
SummarizationMiddlewareandModelCallLimitMiddleware, then ends with a "supervisor with tools" multi-agent pattern (nolanggraph-supervisordependency — just a ReAct agent whose tools are sub-agents) - Notebook 8 demonstrates LangGraph's
SendAPI for dynamic fan-out/fan-in with a timing comparison showing parallel vs. sequential speedup
At the time of writing, we need a Python virtual environment with Python 3.11.
python3.11 --versionpython3.11 -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windowspip install -r requirements.txtcp .env.example .env
# Edit .env with your API keysAll notebooks use OpenRouter as the LLM gateway (via ChatOpenAI with a custom base_url), so you can swap models by changing the model string (e.g., openai/gpt-4.1, anthropic/claude-sonnet-4, etc.).
Web search uses SerpAPI (free tier available).
Embeddings in the RAG notebook use HuggingFace (all-MiniLM-L6-v2) — runs locally, no API key needed.
- oreilly-ai-agents — evaluation, tool selection, positional bias
- building-agentic-ai — deep research, multi-agent SDR, policy bots
Sinan Ozdemir — LinkedIn | GitHub | AI Office Hours | Building Agentic AI Substack
