docs: document multi-modal datasets by wochinge · Pull Request #3119 · langfuse/langfuse-docs

wochinge · 2026-06-17T08:46:54Z

Summary

Document multi-modal dataset item creation in the datasets docs, including UI upload flow, SDK examples, supported SDK versions, and the SDK-only experiment limitation.
Add SDK experiment guidance for resolving dataset media references and using LangfuseMediaReference in Python and JS/TS.
Add a dated changelog entry and cross-link related data model, multi-modality, and self-hosted blob storage docs.

Linear

LFE-10362

Major Decisions

Keep multi-modal dataset guidance as a separate section from dataset creation because media support applies to dataset items and SDK-based experiments, not to dataset setup.
Keep UI-based experiment limitations explicit wherever multi-modal datasets are introduced.

Review Focus

Confirm the SDK minimum versions and UI-based experiment caveat are accurate.
Check that the new creation video and changelog wording match the intended launch positioning.

Greptile Summary

This PR adds documentation for multi-modal dataset items — covering UI upload flow, Python/JS/TS SDK creation examples, SDK version callouts, and an end-to-end SDK experiment guide using LangfuseMediaReference. It also introduces a new changelog entry and cross-links the blobstorage, multi-modality, and data-model reference pages.

New multi-modal section in datasets.mdx documents item creation via UI and SDK, with version constraints (Python ≥ 4.10.0, @langfuse/client ≥ 5.5.0) and a UI-experiment limitation callout.
New experiments-via-sdk.mdx subsection shows full experiment flow: fetch dataset with resolve_media_references=True, unpack LangfuseMediaReference, and call the model provider with bytes/base64/data-URI.
data-model.mdx gains a mediaReferences field on DatasetItem and a new DatasetItemMediaReference object table; the media sub-field is marked Required: Yes but can be null, which is worth revisiting for clarity.

Confidence Score: 4/5

Safe to merge; all changes are documentation-only with no runtime code paths.

The changes are well-structured and internally consistent. The data-model table marks the media sub-field as Required while describing it as nullable, which could mislead SDK consumers. Pre-existing JS/TS examples using the old langfuse.api.datasetItems.create pattern were not updated to match the new langfuse.dataset.createItem shape introduced elsewhere on the same page. Both are minor doc-clarity issues with no functional impact.

content/docs/evaluation/experiments/data-model.mdx (nullable-but-Required field) and content/docs/evaluation/experiments/datasets.mdx (API shape inconsistency in JS/TS examples).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([User / CI]) -->|1 wrap file in LangfuseMedia| B[SDK: create_dataset_item / createItem]
    B -->|2 upload bytes via presigned URL| C[(S3 / Blob Storage)]
    B -->|3 store media reference token| D[(Langfuse DB: DatasetItem)]
    D -->|4 UI reads token| E[Langfuse UI: renders preview]

    A2([Experiment Runner]) -->|5 get_dataset resolve_media_references=True| D
    D -->|6 generate signed download URL| C
    C -->|7 return DatasetItemMediaReference with signed url + urlExpiry| A2
    A2 -->|8 fetch_bytes / fetch_base64 / fetch_data_uri| C
    A2 -->|9 pass media to model provider| F[LLM / Vision Model]
    F -->|10 output| A2
    A2 -->|11 scores + traces| D

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A([User / CI]) -->|1 wrap file in LangfuseMedia| B[SDK: create_dataset_item / createItem]
    B -->|2 upload bytes via presigned URL| C[(S3 / Blob Storage)]
    B -->|3 store media reference token| D[(Langfuse DB: DatasetItem)]
    D -->|4 UI reads token| E[Langfuse UI: renders preview]

    A2([Experiment Runner]) -->|5 get_dataset resolve_media_references=True| D
    D -->|6 generate signed download URL| C
    C -->|7 return DatasetItemMediaReference with signed url + urlExpiry| A2
    A2 -->|8 fetch_bytes / fetch_base64 / fetch_data_uri| C
    A2 -->|9 pass media to model provider| F[LLM / Vision Model]
    F -->|10 output| A2
    A2 -->|11 scores + traces| D

Prompt To Fix All With AI

Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
content/docs/evaluation/experiments/data-model.mdx:79
**`media` Required field contradicts nullable description**

The table marks `media` as `Required: Yes`, but the description immediately states it can be `null`. API consumers who rely on the `Required` column to skip null-checks will get runtime errors when referencing media that was never successfully uploaded. Consider changing Required to `No` (or `Yes (nullable)`) to signal that the field can be absent or null.

### Issue 2 of 3
content/docs/evaluation/experiments/datasets.mdx:456-464
**Old JS/TS SDK call pattern still present after new pattern introduced**

This PR updates `datasets-create-dataset-item.mdx` to use `langfuse.dataset.createItem(...)` but the "Create items from production data" section here (and the "Edit/archive dataset items" section below) still call `langfuse.api.datasetItems.create(...)`. Within the same page, readers now see two different API shapes for the same operation, which can cause confusion about the canonical way to create items in JS/TS.

### Issue 3 of 3
content/docs/evaluation/experiments/experiments-via-sdk.mdx:241
**`assert` as runtime type guard is unsafe in production code**

`assert isinstance(image, LangfuseMediaReference)` is disabled when Python runs with the `-O` / `-OO` optimisation flags, meaning the guard silently disappears in optimised builds. For documentation examples that readers copy verbatim, an explicit `if` check (e.g. `if not isinstance(image, LangfuseMediaReference): raise TypeError(...)`) or a simple inline comment noting the expected type would be safer.

_{Reviews (1): Last reviewed commit: "docs(evaluation): document multi-modal d..." | Re-trigger Greptile}

Greptile also left 2 inline comments on this PR.

vercel · 2026-06-17T08:47:00Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
langfuse-docs	Ready	Preview, Comment	Jun 23, 2026 10:05am

github-actions · 2026-06-17T08:47:07Z

@claude review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f1207db0b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

claude

Prior review comments are all resolved by recent commits and no new bugs were found — LGTM.

Extended reasoning...

Overview

Docs-only PR documenting multi-modal dataset items: a new section in datasets.mdx, a new multi-modal experiments subsection in experiments-via-sdk.mdx, new DatasetItemMediaReference reference rows in data-model.mdx, a dated changelog entry, and cross-links from the multi-modality and self-hosted blob storage pages. The embedded components-mdx/datasets-create-dataset-item.mdx snippet was also migrated from langfuse.api.datasetItems.create to langfuse.dataset.createItem.

Security risks

None — content-only .mdx changes with no runtime code, secrets, or auth-adjacent surfaces touched. The self-hosted blob storage page edits are wording-only (renaming a section header, broadening one sentence to mention datasets); no env-var defaults or example credentials changed.

Level of scrutiny

Low. Docs PRs that ship through the Next.js docs site have no runtime blast radius beyond the rendered page, and an active human author (Tobias Wochinger) is iterating on it. My prior pass already exercised the code samples against the SDK type definitions; the follow-up commits picked up each correction.

Other factors

All three of my previous inline comments are resolved: the JS/TS task signature is now async (item) => in experiments-via-sdk.mdx, the pre-existing same-bug instances in datasets.mdx and the versioned-experiments changelog were fixed in the same pass, both stale JS/TS langfuse.api.datasetItems.create call sites in datasets.mdx were migrated, and the expected_output enum value was clarified inline (the API value really is snake_case; the new wording "expected_output (for expectedOutput)" disambiguates it from the JS field name). Greptile's open suggestion about marking media as Required: No instead of Yes (nullable) is a wording preference, and Tobias explicitly pushed back on the assert isinstance suggestion for reader pedagogy — both are within author discretion and not blockers. The bug-hunting system found no new issues on the current commit.

docs(evaluation): document multi-modal datasets

7f1207d

dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. docs labels Jun 17, 2026

wochinge requested a review from Lotte-Verheyden June 17, 2026 08:49

vercel Bot deployed to Preview June 17, 2026 08:49 View deployment

greptile-apps Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread content/docs/evaluation/experiments/data-model.mdx Outdated

Comment thread content/docs/evaluation/experiments/experiments-via-sdk.mdx

chatgpt-codex-connector Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread components-mdx/datasets-create-dataset-item.mdx

claude Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread content/docs/evaluation/experiments/experiments-via-sdk.mdx

Comment thread components-mdx/datasets-create-dataset-item.mdx

Comment thread content/docs/evaluation/experiments/data-model.mdx Outdated

docs(evaluation): address multi-modal dataset review

6dd4569

vercel Bot deployed to Preview June 17, 2026 09:20 View deployment

docs(evaluation): reflect default dataset media resolution

b1314db

vercel Bot deployed to Preview June 23, 2026 07:51 View deployment

wochinge commented Jun 23, 2026

View reviewed changes

Comment thread content/docs/evaluation/experiments/datasets.mdx Outdated

claude Bot reviewed Jun 23, 2026

View reviewed changes

docs(evaluation): update multi-modal dataset release versions

d3d81e1

vercel Bot deployed to Preview June 23, 2026 08:43 View deployment

docs(evaluation): clarify dataset media url expiry

6046891

wochinge enabled auto-merge June 23, 2026 10:04

dosubot Bot added the auto-merge This PR is set to be merged label Jun 23, 2026

vercel Bot deployed to Preview June 23, 2026 10:05 View deployment

wochinge added this pull request to the merge queue Jun 23, 2026

Merged via the queue into main with commit a2865cd Jun 23, 2026
17 checks passed

wochinge deleted the feature/lfe-10362-v1-docs branch June 23, 2026 10:12

dosubot Bot removed the auto-merge This PR is set to be merged label Jun 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: document multi-modal datasets#3119

docs: document multi-modal datasets#3119
wochinge merged 5 commits into
mainfrom
feature/lfe-10362-v1-docs

wochinge commented Jun 17, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

vercel Bot commented Jun 17, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

wochinge commented Jun 17, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Linear

Major Decisions

Review Focus

Greptile Summary

Confidence Score: 4/5

Flowchart

Uh oh!

vercel Bot commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 17, 2026

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wochinge commented Jun 17, 2026 •

edited by greptile-apps Bot

Loading

vercel Bot commented Jun 17, 2026 •

edited

Loading