From 1b1b216959ead6a2292916663d809c5ae73bc934 Mon Sep 17 00:00:00 2001 From: Tobias Wochinger Date: Mon, 8 Jun 2026 23:47:23 +0200 Subject: [PATCH 1/9] docs(blog): add post on running untrusted code for code evaluators Co-Authored-By: Claude Opus 4.8 (1M context) --- ...026-06-08-running-customer-code-safely.mdx | 170 ++++++++++++++++++ .../guardduty-access-denied.png | Bin 0 -> 394000 bytes .../guardduty-external-ip.png | Bin 0 -> 42597 bytes 3 files changed, 170 insertions(+) create mode 100644 content/blog/2026-06-08-running-customer-code-safely.mdx create mode 100644 public/images/blog/2026-06-08-running-customer-code-safely/guardduty-access-denied.png create mode 100644 public/images/blog/2026-06-08-running-customer-code-safely/guardduty-external-ip.png diff --git a/content/blog/2026-06-08-running-customer-code-safely.mdx b/content/blog/2026-06-08-running-customer-code-safely.mdx new file mode 100644 index 000000000..2d47564c5 --- /dev/null +++ b/content/blog/2026-06-08-running-customer-code-safely.mdx @@ -0,0 +1,170 @@ +--- +title: "The week users extracted our AWS credentials — and why we slept fine" +date: 2026/06/08 +description: "Code evaluators let Langfuse users run their own Python or TypeScript to score incoming LLM observability data. Here is how we safely execute that code across {{TENANT_COUNT}} tenants and {{MONTHLY_RUNS}} monthly runs." +tag: engineering +author: Tobias +--- + +import { BlogHeader } from "@/components/blog/BlogHeader"; + + + +Two weeks ago we shipped [code evaluators](/docs/evaluation/evaluation-methods/code-evaluators): you write a small Python or TypeScript function and Langfuse runs it to score your traces. Last week, a few users started probing the boundaries — extracting our AWS credentials from the runtime to find out what an attacker could actually do with them. + +We are still sleeping fine. This post is about the system that lets us: how we run our users' code inside our own infrastructure, next to **{{TENANT_COUNT}} other tenants**, at peak around **{{MONTHLY_RUNS}} executions a month**, without it becoming the scariest thing we operate. + +