diff --git a/content/blog/2026-06-22-code-evaluators-execution-model.mdx b/content/blog/2026-06-22-code-evaluators-execution-model.mdx new file mode 100644 index 000000000..3cbed32b2 --- /dev/null +++ b/content/blog/2026-06-22-code-evaluators-execution-model.mdx @@ -0,0 +1,209 @@ +--- +title: "Designing the runtime for Langfuse code evaluators" +date: 2026/06/22 +description: "Code evaluators let you score traces with your own Python or TypeScript code. A look at the execution model behind them: the requirements, the options we rejected, and the security stance we adopted." +tag: engineering +author: Tobias +--- + +import { BlogHeader } from "@/components/blog/BlogHeader"; + + + +In late May we shipped [code evaluators](/docs/evaluation/evaluation-methods/code-evaluators): you write a small Python or TypeScript function and Langfuse runs it to score your traces. By design, this means effectively anybody can run untrusted code in our multi-tenant SaaS environment: an environment that holds petabytes of critical data from thousands of teams. + +This post walks through how we designed the runtime environment for it, the options we ruled out along the way, and what happened when the first people tried to break it. + +