From 19e87f1c5b89c69ffd51af08e3515fc2821fbcc2 Mon Sep 17 00:00:00 2001 From: Annabell Schaefer Date: Mon, 27 Apr 2026 18:01:11 +0900 Subject: [PATCH 1/3] rewrite intro --- content/academy/ai-engineering-loop/meta.json | 6 +++ .../academy/ai-engineering-loop/overview.mdx | 42 ++++++++++++++++ content/academy/index.mdx | 48 +++++++++---------- content/academy/meta.json | 1 + 4 files changed, 73 insertions(+), 24 deletions(-) create mode 100644 content/academy/ai-engineering-loop/meta.json create mode 100644 content/academy/ai-engineering-loop/overview.mdx diff --git a/content/academy/ai-engineering-loop/meta.json b/content/academy/ai-engineering-loop/meta.json new file mode 100644 index 000000000..7721cbb78 --- /dev/null +++ b/content/academy/ai-engineering-loop/meta.json @@ -0,0 +1,6 @@ +{ + "title": "AI Engineering Loop", + "pages": [ + "overview" + ] +} diff --git a/content/academy/ai-engineering-loop/overview.mdx b/content/academy/ai-engineering-loop/overview.mdx new file mode 100644 index 000000000..c8f4b223d --- /dev/null +++ b/content/academy/ai-engineering-loop/overview.mdx @@ -0,0 +1,42 @@ +--- +title: AI Engineering Loop +description: A high-level map of the AI engineering lifecycle, from tracing and monitoring to datasets, experiments, and evaluation. +--- + +# The AI Engineering Loop + +Building with LLMs is an iterative engineering process. Because outputs are probabilistic, teams need a loop for seeing what happened, finding failure modes, testing changes, and deciding what to ship. + +![The AI engineering loop](/images/academy/loop-overview.png) + +The loop is a working model, not a strict waterfall. Teams move through it repeatedly, and different parts of the loop become more important as a product matures. + +## The steps + +### 1. Tracing + +Tracing captures the full path of a request so you can inspect prompts, retrieved context, tool calls, outputs, latency, and cost in one place. Read [Tracing](/academy/tracing) for a breakdown of what a useful trace looks like and why traces become the foundation for everything else. + +### 2. Monitoring + +Monitoring turns raw traces into ongoing visibility by tracking trends and surfacing the cases that deserve attention. Read [Monitoring](/academy/monitoring) to understand how teams watch quality, cost, latency, and production failures over time. + +### 3. Datasets + +Datasets turn real scenarios into repeatable test cases so you can check whether a change helps across more than a handful of examples. Read [Datasets](/academy/datasets) for how to structure dataset items and when it makes sense to split or grow a dataset. + +### 4. Experiments + +Experiments let you change one variable at a time and compare outputs against a stable baseline instead of relying on intuition alone. Read [Experiments](/academy/experiments) to see how to isolate variables, compare variants, and learn what actually improved. + +### 5. Evaluation + +Evaluation is how you decide whether results are good enough to ship, using manual review, code-based checks, or LLM judges depending on the task. Read [Evaluate](/academy/evaluate) for how teams score outputs and turn qualitative judgments into a repeatable process. + +## What the loop helps you balance + +Across the loop, teams are usually balancing three things at once: output quality, latency, and cost. The point is not to optimize one number in isolation, but to make tradeoffs explicit and grounded in evidence from your own application. + +## Where the docs fit + +This page gives you the map, and Academy explains the concepts behind each step. When you want Langfuse-specific implementation details, move into the [docs](/docs) and [guides](/guides). diff --git a/content/academy/index.mdx b/content/academy/index.mdx index 2b9e99035..1f8906da6 100644 --- a/content/academy/index.mdx +++ b/content/academy/index.mdx @@ -1,44 +1,44 @@ --- title: Langfuse Academy -description: Build a mental model for AI engineering. Learn the core disciplines teams rely on as LLM applications move from prototype to production. +description: Understand why LLM engineering is different and how to navigate the full AI engineering lifecycle. --- # Welcome to Langfuse Academy -This is the place to build a mental model for AI engineering. We'll introduce the core disciplines teams rely on as LLM applications move from prototype to production. +Building with LLMs changes the job of engineering teams. Once outputs become probabilistic, a system can be technically healthy and still produce responses that are wrong, incomplete, off-brand, unsafe, or simply not useful. -Rather than focusing on individual product features, Academy is meant to help you understand the bigger picture, and how teams can work with that change in a systematic way. +That changes what teams need to understand and manage. It is no longer enough to ask whether the system ran. You also need a way to reason about output quality, iteration, and the tradeoffs that come with shipping AI products. -## Why LLM observability is different +Langfuse Academy exists to help you build that mental model. It maps the AI engineering lifecycle step by step so you can understand how the pieces fit together and what it takes to move from prototype to production. -Traditional observability remains essential. Teams still need to know whether their systems are up, whether requests are slow, whether dependencies are failing, and whether costs are under control. Those questions do not disappear when an application starts using LLMs. +## Why we are publishing this -But LLM applications introduce a different kind of challenge. Their behavior is probabilistic: the same input can produce different outputs, and a response can look plausible even when it is wrong, incomplete, off-brand, unsafe, or simply unhelpful. In other words, a request can succeed technically and still fail for the user. +Langfuse is open source, and we want to open source the conceptual side of AI engineering too. The Academy is our attempt to make the mental models, vocabulary, and workflows behind LLM application development easier to access for everyone. - -TODO: insert a visual or an example here to break up the text - +## Who this is for -AI engineering is not only about reliability. It is also about quality. Teams need to understand whether the output was useful, grounded, safe, and worth the cost. Observability for LLM applications therefore sits closer to product quality and iteration than traditional application monitoring usually does. +- AI engineers building LLM applications and agents +- Software engineers moving into AI product development +- Product managers who need to reason about quality, iteration, and tradeoffs +- People learning the field and trying to understand the core concepts +- Technical and business leaders who need a working model of how AI systems are built and improved -Modern observability platforms for LLM systems increasingly treat prompts, responses, token usage, quality signals, and model-specific behavior as first-class telemetry. +## What you will find here -## The AI engineering loop +Academy follows the AI engineering lifecycle from first visibility into production behavior all the way to structured improvement and evaluation. The goal is to explain why each step exists, what problem it solves, and how the steps connect. -Because of this, AI engineering is iterative. Teams do not build once, ship once, and assume the work is done. They observe behavior, learn from it, improve the system, and evaluate the result over time. +Start with [The AI Engineering Loop](/academy/ai-engineering-loop) for the high-level map, then go deeper into the individual parts: -![The AI engineering loop](/images/academy/loop-overview.png) +- [Tracing](/academy/tracing) +- [Monitoring](/academy/monitoring) +- [Datasets](/academy/datasets) +- [Experiments](/academy/experiments) +- [Evaluate](/academy/evaluate) - -TODO: replace with final loop visual; should we explain each step in 1-2 sentences? Keep it concise - +Some pages stay at the conceptual level, and some go deeper into specific disciplines. You can read the full sequence or jump to the topic that is most relevant to your team right now. -## What comes next +## Academy and docs do different jobs -The rest of Langfuse Academy goes deeper into each step of the loop. +Academy focuses on high-level concepts and mental models. The [docs](/docs) and [guides](/guides) cover Langfuse features, product implementation details, and step-by-step how-tos. -Each section is designed to work on its own: it gives you an overview first, and then lets you go deeper if and when that makes sense for your use case. You can follow the full loop, or focus only on the parts that are most relevant for your team right now. - -You also do not need to adopt everything at once. Most teams improve their setup iteratively over time, adding new practices as they become useful. Doing part of this loop is already better than having no LLM engineering practices at all. - -Let's dive in! +Use Academy to understand the lifecycle. Use the docs when you are ready to implement it in Langfuse. diff --git a/content/academy/meta.json b/content/academy/meta.json index 895de021c..74f055207 100644 --- a/content/academy/meta.json +++ b/content/academy/meta.json @@ -2,6 +2,7 @@ "title": "Academy", "pages": [ "index", + "ai-engineering-loop", "---The Loop---", "tracing", "monitoring", From b4b824de46fc43712d3ad395c6cbedadd2a6759c Mon Sep 17 00:00:00 2001 From: Annabell Schaefer Date: Wed, 29 Apr 2026 09:08:14 +0900 Subject: [PATCH 2/3] change to engineering loop and overview --- content/academy/ai-engineering-loop.mdx | 71 +++++++++++++++++++ content/academy/ai-engineering-loop/meta.json | 6 -- .../academy/ai-engineering-loop/overview.mdx | 42 ----------- content/academy/datasets/meta.json | 2 +- content/academy/evaluate/meta.json | 2 +- content/academy/experiments/meta.json | 2 +- content/academy/index.mdx | 35 +++++---- 7 files changed, 90 insertions(+), 70 deletions(-) create mode 100644 content/academy/ai-engineering-loop.mdx delete mode 100644 content/academy/ai-engineering-loop/meta.json delete mode 100644 content/academy/ai-engineering-loop/overview.mdx diff --git a/content/academy/ai-engineering-loop.mdx b/content/academy/ai-engineering-loop.mdx new file mode 100644 index 000000000..aa599db70 --- /dev/null +++ b/content/academy/ai-engineering-loop.mdx @@ -0,0 +1,71 @@ +--- +title: AI Engineering Loop +description: A high-level map of the AI engineering lifecycle, from tracing and monitoring to building datasets, experimenting, and evaluating. +--- + +import { Activity, BadgeCheck, Database, FlaskConical, Route } from "lucide-react"; + +# The AI Engineering Loop + +Building with LLMs is not a one-way delivery process. A system can be technically healthy and still fail on output quality, cost, latency, or consistency once it meets real users. + +That is why AI engineering needs a loop. Teams need a way to observe real behavior, identify failure modes, turn those findings into test cases, compare improvements, and decide what is actually worth shipping. + +![The AI engineering loop](/images/academy/loop-overview.png) + +The loop below is a practical way to think about that work. It connects production visibility with structured improvement, so teams can move from "something feels off" to "we know what changed, why it changed, and whether it is better." + +## Read it as a loop + +Start in production: tracing captures what happened, monitoring tells you what deserves attention, datasets turn recurring patterns into repeatable test cases, experiments isolate changes, and evaluation tells you whether the new version is actually better. + +Once you ship a change, the cycle starts again. The updated system creates new traces, new monitoring signals, and new opportunities to improve. + +## From production signals to better systems + + + } + arrow + > + Capture the full path of a request, including prompts, retrieved context, tool calls, outputs, latency, and cost. + + } + arrow + > + Track how the system behaves over time and surface the traces that deserve attention. + + } + arrow + > + Turn real scenarios into repeatable test cases so you can measure whether a change helps across more than a few examples. + + } + arrow + > + Change one variable at a time, compare it against a stable baseline, and learn what actually improved. + + } + arrow + > + Decide whether results are good enough to ship using manual review, code-based checks, or LLM judges. + + + +## What teams are balancing + +Across the loop, teams are balancing output quality, latency, and cost. The goal is to make those tradeoffs explicit and grounded in evidence from your own application. diff --git a/content/academy/ai-engineering-loop/meta.json b/content/academy/ai-engineering-loop/meta.json deleted file mode 100644 index 7721cbb78..000000000 --- a/content/academy/ai-engineering-loop/meta.json +++ /dev/null @@ -1,6 +0,0 @@ -{ - "title": "AI Engineering Loop", - "pages": [ - "overview" - ] -} diff --git a/content/academy/ai-engineering-loop/overview.mdx b/content/academy/ai-engineering-loop/overview.mdx deleted file mode 100644 index c8f4b223d..000000000 --- a/content/academy/ai-engineering-loop/overview.mdx +++ /dev/null @@ -1,42 +0,0 @@ ---- -title: AI Engineering Loop -description: A high-level map of the AI engineering lifecycle, from tracing and monitoring to datasets, experiments, and evaluation. ---- - -# The AI Engineering Loop - -Building with LLMs is an iterative engineering process. Because outputs are probabilistic, teams need a loop for seeing what happened, finding failure modes, testing changes, and deciding what to ship. - -![The AI engineering loop](/images/academy/loop-overview.png) - -The loop is a working model, not a strict waterfall. Teams move through it repeatedly, and different parts of the loop become more important as a product matures. - -## The steps - -### 1. Tracing - -Tracing captures the full path of a request so you can inspect prompts, retrieved context, tool calls, outputs, latency, and cost in one place. Read [Tracing](/academy/tracing) for a breakdown of what a useful trace looks like and why traces become the foundation for everything else. - -### 2. Monitoring - -Monitoring turns raw traces into ongoing visibility by tracking trends and surfacing the cases that deserve attention. Read [Monitoring](/academy/monitoring) to understand how teams watch quality, cost, latency, and production failures over time. - -### 3. Datasets - -Datasets turn real scenarios into repeatable test cases so you can check whether a change helps across more than a handful of examples. Read [Datasets](/academy/datasets) for how to structure dataset items and when it makes sense to split or grow a dataset. - -### 4. Experiments - -Experiments let you change one variable at a time and compare outputs against a stable baseline instead of relying on intuition alone. Read [Experiments](/academy/experiments) to see how to isolate variables, compare variants, and learn what actually improved. - -### 5. Evaluation - -Evaluation is how you decide whether results are good enough to ship, using manual review, code-based checks, or LLM judges depending on the task. Read [Evaluate](/academy/evaluate) for how teams score outputs and turn qualitative judgments into a repeatable process. - -## What the loop helps you balance - -Across the loop, teams are usually balancing three things at once: output quality, latency, and cost. The point is not to optimize one number in isolation, but to make tradeoffs explicit and grounded in evidence from your own application. - -## Where the docs fit - -This page gives you the map, and Academy explains the concepts behind each step. When you want Langfuse-specific implementation details, move into the [docs](/docs) and [guides](/guides). diff --git a/content/academy/datasets/meta.json b/content/academy/datasets/meta.json index 68592e861..6dd956a9b 100644 --- a/content/academy/datasets/meta.json +++ b/content/academy/datasets/meta.json @@ -1,5 +1,5 @@ { - "title": "Datasets", + "title": "Building Datasets", "pages": [ "overview" ] diff --git a/content/academy/evaluate/meta.json b/content/academy/evaluate/meta.json index f1bf37585..bd19c5454 100644 --- a/content/academy/evaluate/meta.json +++ b/content/academy/evaluate/meta.json @@ -1,5 +1,5 @@ { - "title": "Evaluate", + "title": "Evaluating", "pages": [ "overview" ] diff --git a/content/academy/experiments/meta.json b/content/academy/experiments/meta.json index 64169544b..c3e2443c7 100644 --- a/content/academy/experiments/meta.json +++ b/content/academy/experiments/meta.json @@ -1,5 +1,5 @@ { - "title": "Experiments", + "title": "Experimenting", "pages": [ "overview" ] diff --git a/content/academy/index.mdx b/content/academy/index.mdx index 1f8906da6..858d3feab 100644 --- a/content/academy/index.mdx +++ b/content/academy/index.mdx @@ -3,42 +3,39 @@ title: Langfuse Academy description: Understand why LLM engineering is different and how to navigate the full AI engineering lifecycle. --- -# Welcome to Langfuse Academy +# Welcome to Langfuse Academy test -Building with LLMs changes the job of engineering teams. Once outputs become probabilistic, a system can be technically healthy and still produce responses that are wrong, incomplete, off-brand, unsafe, or simply not useful. +Building with LLMs changes what it means for a system to work. Outputs are probabilistic. A system can run fine and still produce responses that are wrong, off-brand, or useless. Teams need to reason about quality, cost, latency, and the tradeoffs between them. -That changes what teams need to understand and manage. It is no longer enough to ask whether the system ran. You also need a way to reason about output quality, iteration, and the tradeoffs that come with shipping AI products. - -Langfuse Academy exists to help you build that mental model. It maps the AI engineering lifecycle step by step so you can understand how the pieces fit together and what it takes to move from prototype to production. +Langfuse Academy maps the AI engineering lifecycle so you understand how the pieces fit and what it takes to ship from prototype to production. ## Why we are publishing this -Langfuse is open source, and we want to open source the conceptual side of AI engineering too. The Academy is our attempt to make the mental models, vocabulary, and workflows behind LLM application development easier to access for everyone. - -## Who this is for +Langfuse is open source, and we want to open source the conceptual side of AI engineering too. The Academy is our way of making the core ideas, vocabulary, and workflows behind LLM application development easier to access for everyone. -- AI engineers building LLM applications and agents -- Software engineers moving into AI product development + +- AI engineers and software engineers building LLM applications and agentic systems - Product managers who need to reason about quality, iteration, and tradeoffs -- People learning the field and trying to understand the core concepts -- Technical and business leaders who need a working model of how AI systems are built and improved +- Technical and business leaders who need a working understanding of how AI systems are built and improved +- AI agents that support humans in understanding AI engineering concepts and workflows + ## What you will find here -Academy follows the AI engineering lifecycle from first visibility into production behavior all the way to structured improvement and evaluation. The goal is to explain why each step exists, what problem it solves, and how the steps connect. +The Langfuse Academy follows the AI engineering lifecycle from first visibility into production behavior all the way to structured improvement and evaluation. The goal is to explain why each step exists, what problem it solves, and how the steps connect. Start with [The AI Engineering Loop](/academy/ai-engineering-loop) for the high-level map, then go deeper into the individual parts: - [Tracing](/academy/tracing) - [Monitoring](/academy/monitoring) -- [Datasets](/academy/datasets) -- [Experiments](/academy/experiments) -- [Evaluate](/academy/evaluate) +- [Building Datasets](/academy/datasets) +- [Experimenting](/academy/experiments) +- [Evaluating](/academy/evaluate) -Some pages stay at the conceptual level, and some go deeper into specific disciplines. You can read the full sequence or jump to the topic that is most relevant to your team right now. +Some pages explain the high-level concepts. Others are deeper dives into individual parts of the lifecycle. You can read the full sequence or jump to the topic that is most relevant to your team right now. ## Academy and docs do different jobs -Academy focuses on high-level concepts and mental models. The [docs](/docs) and [guides](/guides) cover Langfuse features, product implementation details, and step-by-step how-tos. +Academy focuses on high-level concepts and how the lifecycle fits together. The [docs](/docs) and [guides](/guides) cover Langfuse features, product implementation details, and step-by-step how-tos. -Use Academy to understand the lifecycle. Use the docs when you are ready to implement it in Langfuse. +Use Academy to understand the lifecycle. Use the docs and guides when you are ready to implement it in Langfuse. From 70a1a16829092c3f0df4860700e9fb20551b5b1c Mon Sep 17 00:00:00 2001 From: Annabell Schaefer Date: Wed, 29 Apr 2026 09:35:41 +0900 Subject: [PATCH 3/3] updated loop page --- content/academy/ai-engineering-loop.mdx | 47 ++++++++++++++++--------- content/academy/index.mdx | 21 +++++------ 2 files changed, 42 insertions(+), 26 deletions(-) diff --git a/content/academy/ai-engineering-loop.mdx b/content/academy/ai-engineering-loop.mdx index aa599db70..f9812b193 100644 --- a/content/academy/ai-engineering-loop.mdx +++ b/content/academy/ai-engineering-loop.mdx @@ -7,21 +7,17 @@ import { Activity, BadgeCheck, Database, FlaskConical, Route } from "lucide-reac # The AI Engineering Loop -Building with LLMs is not a one-way delivery process. A system can be technically healthy and still fail on output quality, cost, latency, or consistency once it meets real users. +The AI Engineering Loop is how teams approach the continuous evolution and improvement of their AI-powered systems. It connects what happens in production directly to the work of improving quality, cost, latency, and reliability during development. -That is why AI engineering needs a loop. Teams need a way to observe real behavior, identify failure modes, turn those findings into test cases, compare improvements, and decide what is actually worth shipping. +Many of the underlying concepts mirror traditional software engineering, but a key differentiator is the probabilistic nature of LLM outputs and the sheer number of paths a system can take. You cannot unit-test your way to confidence. You need a systematic way to observe, learn, and improve. ![The AI engineering loop](/images/academy/loop-overview.png) -The loop below is a practical way to think about that work. It connects production visibility with structured improvement, so teams can move from "something feels off" to "we know what changed, why it changed, and whether it is better." +The loop clusters into two areas of work. -## Read it as a loop +## 1. Understanding what's happening in production -Start in production: tracing captures what happened, monitoring tells you what deserves attention, datasets turn recurring patterns into repeatable test cases, experiments isolate changes, and evaluation tells you whether the new version is actually better. - -Once you ship a change, the cycle starts again. The updated system creates new traces, new monitoring signals, and new opportunities to improve. - -## From production signals to better systems +The first part is about visibility. What is your system actually doing in the real world? Which requests are going well, and which are failing in ways that matter? } arrow > - Capture the full path of a request, including prompts, retrieved context, tool calls, outputs, latency, and cost. + Capture the full path of a request, including prompts, retrieved context, tool calls, outputs, latency, and cost. Tracing is the raw record of what your system actually did. } arrow > - Track how the system behaves over time and surface the traces that deserve attention. + Track how the system behaves over time and surface the traces that deserve attention. Monitoring turns a stream of raw data into an ongoing understanding of how the system evolves. + + +## 2. Improving systematically during development + +The second part is about turning what you have observed into improvements you can trust — without degrading the parts of the system that are already working. + + } arrow > - Turn real scenarios into repeatable test cases so you can measure whether a change helps across more than a few examples. + Turn real scenarios surfaced through monitoring into repeatable test cases. Instead of testing against a handful of hand-picked examples, you build a set that reflects how the system actually gets used. } arrow > - Change one variable at a time, compare it against a stable baseline, and learn what actually improved. + Change one variable at a time — a prompt, a model, a retrieval strategy — and compare it against a stable baseline. That way you know what actually improved instead of guessing. } arrow > - Decide whether results are good enough to ship using manual review, code-based checks, or LLM judges. + Decide whether results are good enough to ship using manual review, code-based checks, or LLM judges. Evaluation is how you turn a comparison into a decision. -## What teams are balancing +Once you ship a change, the cycle starts again. The updated system produces new traces, new monitoring signals, and new opportunities to improve. + +## You don't have to close the full loop on day one + +Most teams don't start with all five steps in place. That is fine. + +The value of the loop is cumulative. Each step you add gives you better signal, more systematic coverage, and more confidence in what you are shipping. The goal is not to implement everything at once — it is to understand where you are and take the next step toward closing the loop. + +{/* TODO: Link blog article about patterns of AI engineering lifecycle adoption once written */} + +## Start with tracing + +The natural place to begin is tracing. You cannot monitor what you cannot see, and you cannot improve what you cannot measure. Tracing is the foundation everything else builds on. -Across the loop, teams are balancing output quality, latency, and cost. The goal is to make those tradeoffs explicit and grounded in evidence from your own application. +[→ Start with Tracing](/academy/tracing) diff --git a/content/academy/index.mdx b/content/academy/index.mdx index 858d3feab..bdd33502c 100644 --- a/content/academy/index.mdx +++ b/content/academy/index.mdx @@ -3,22 +3,13 @@ title: Langfuse Academy description: Understand why LLM engineering is different and how to navigate the full AI engineering lifecycle. --- -# Welcome to Langfuse Academy test +# Welcome to Langfuse Academy Building with LLMs changes what it means for a system to work. Outputs are probabilistic. A system can run fine and still produce responses that are wrong, off-brand, or useless. Teams need to reason about quality, cost, latency, and the tradeoffs between them. Langfuse Academy maps the AI engineering lifecycle so you understand how the pieces fit and what it takes to ship from prototype to production. -## Why we are publishing this - -Langfuse is open source, and we want to open source the conceptual side of AI engineering too. The Academy is our way of making the core ideas, vocabulary, and workflows behind LLM application development easier to access for everyone. - -- AI engineers and software engineers building LLM applications and agentic systems -- Product managers who need to reason about quality, iteration, and tradeoffs -- Technical and business leaders who need a working understanding of how AI systems are built and improved -- AI agents that support humans in understanding AI engineering concepts and workflows - ## What you will find here @@ -39,3 +30,13 @@ Some pages explain the high-level concepts. Others are deeper dives into individ Academy focuses on high-level concepts and how the lifecycle fits together. The [docs](/docs) and [guides](/guides) cover Langfuse features, product implementation details, and step-by-step how-tos. Use Academy to understand the lifecycle. Use the docs and guides when you are ready to implement it in Langfuse. + + +- AI engineers and software engineers building LLM applications and agentic systems +- Product managers who need to reason about quality, iteration, and tradeoffs +- Technical and business leaders who need a working understanding of how AI systems are built and improved +- AI agents that support humans in understanding AI engineering concepts and workflows + + +## Why we are publishing this +Langfuse is open source, and we want to open source the conceptual side of AI engineering too. The Academy is our way of making the core ideas, vocabulary, and workflows behind LLM application development easier to access for everyone.