Skip to content

docs: document multi-modal datasets#3119

Merged
wochinge merged 5 commits into
mainfrom
feature/lfe-10362-v1-docs
Jun 23, 2026
Merged

docs: document multi-modal datasets#3119
wochinge merged 5 commits into
mainfrom
feature/lfe-10362-v1-docs

Conversation

@wochinge

@wochinge wochinge commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Document multi-modal dataset item creation in the datasets docs, including UI upload flow, SDK examples, supported SDK versions, and the SDK-only experiment limitation.
  • Add SDK experiment guidance for resolving dataset media references and using LangfuseMediaReference in Python and JS/TS.
  • Add a dated changelog entry and cross-link related data model, multi-modality, and self-hosted blob storage docs.

Linear

Major Decisions

  • Keep multi-modal dataset guidance as a separate section from dataset creation because media support applies to dataset items and SDK-based experiments, not to dataset setup.
  • Keep UI-based experiment limitations explicit wherever multi-modal datasets are introduced.

Review Focus

  • Confirm the SDK minimum versions and UI-based experiment caveat are accurate.
  • Check that the new creation video and changelog wording match the intended launch positioning.

Greptile Summary

This PR adds documentation for multi-modal dataset items — covering UI upload flow, Python/JS/TS SDK creation examples, SDK version callouts, and an end-to-end SDK experiment guide using LangfuseMediaReference. It also introduces a new changelog entry and cross-links the blobstorage, multi-modality, and data-model reference pages.

  • New multi-modal section in datasets.mdx documents item creation via UI and SDK, with version constraints (Python ≥ 4.10.0, @langfuse/client ≥ 5.5.0) and a UI-experiment limitation callout.
  • New experiments-via-sdk.mdx subsection shows full experiment flow: fetch dataset with resolve_media_references=True, unpack LangfuseMediaReference, and call the model provider with bytes/base64/data-URI.
  • data-model.mdx gains a mediaReferences field on DatasetItem and a new DatasetItemMediaReference object table; the media sub-field is marked Required: Yes but can be null, which is worth revisiting for clarity.

Confidence Score: 4/5

Safe to merge; all changes are documentation-only with no runtime code paths.

The changes are well-structured and internally consistent. The data-model table marks the media sub-field as Required while describing it as nullable, which could mislead SDK consumers. Pre-existing JS/TS examples using the old langfuse.api.datasetItems.create pattern were not updated to match the new langfuse.dataset.createItem shape introduced elsewhere on the same page. Both are minor doc-clarity issues with no functional impact.

content/docs/evaluation/experiments/data-model.mdx (nullable-but-Required field) and content/docs/evaluation/experiments/datasets.mdx (API shape inconsistency in JS/TS examples).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([User / CI]) -->|1 wrap file in LangfuseMedia| B[SDK: create_dataset_item / createItem]
    B -->|2 upload bytes via presigned URL| C[(S3 / Blob Storage)]
    B -->|3 store media reference token| D[(Langfuse DB: DatasetItem)]
    D -->|4 UI reads token| E[Langfuse UI: renders preview]

    A2([Experiment Runner]) -->|5 get_dataset resolve_media_references=True| D
    D -->|6 generate signed download URL| C
    C -->|7 return DatasetItemMediaReference with signed url + urlExpiry| A2
    A2 -->|8 fetch_bytes / fetch_base64 / fetch_data_uri| C
    A2 -->|9 pass media to model provider| F[LLM / Vision Model]
    F -->|10 output| A2
    A2 -->|11 scores + traces| D
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A([User / CI]) -->|1 wrap file in LangfuseMedia| B[SDK: create_dataset_item / createItem]
    B -->|2 upload bytes via presigned URL| C[(S3 / Blob Storage)]
    B -->|3 store media reference token| D[(Langfuse DB: DatasetItem)]
    D -->|4 UI reads token| E[Langfuse UI: renders preview]

    A2([Experiment Runner]) -->|5 get_dataset resolve_media_references=True| D
    D -->|6 generate signed download URL| C
    C -->|7 return DatasetItemMediaReference with signed url + urlExpiry| A2
    A2 -->|8 fetch_bytes / fetch_base64 / fetch_data_uri| C
    A2 -->|9 pass media to model provider| F[LLM / Vision Model]
    F -->|10 output| A2
    A2 -->|11 scores + traces| D
Loading
Prompt To Fix All With AI
Fix the following 3 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 3
content/docs/evaluation/experiments/data-model.mdx:79
**`media` Required field contradicts nullable description**

The table marks `media` as `Required: Yes`, but the description immediately states it can be `null`. API consumers who rely on the `Required` column to skip null-checks will get runtime errors when referencing media that was never successfully uploaded. Consider changing Required to `No` (or `Yes (nullable)`) to signal that the field can be absent or null.

### Issue 2 of 3
content/docs/evaluation/experiments/datasets.mdx:456-464
**Old JS/TS SDK call pattern still present after new pattern introduced**

This PR updates `datasets-create-dataset-item.mdx` to use `langfuse.dataset.createItem(...)` but the "Create items from production data" section here (and the "Edit/archive dataset items" section below) still call `langfuse.api.datasetItems.create(...)`. Within the same page, readers now see two different API shapes for the same operation, which can cause confusion about the canonical way to create items in JS/TS.

### Issue 3 of 3
content/docs/evaluation/experiments/experiments-via-sdk.mdx:241
**`assert` as runtime type guard is unsafe in production code**

`assert isinstance(image, LangfuseMediaReference)` is disabled when Python runs with the `-O` / `-OO` optimisation flags, meaning the guard silently disappears in optimised builds. For documentation examples that readers copy verbatim, an explicit `if` check (e.g. `if not isinstance(image, LangfuseMediaReference): raise TypeError(...)`) or a simple inline comment noting the expected type would be safer.

Reviews (1): Last reviewed commit: "docs(evaluation): document multi-modal d..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

@vercel

vercel Bot commented Jun 17, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-docs Ready Ready Preview, Comment Jun 23, 2026 10:05am

Request Review

@dosubot dosubot Bot added size:L This PR changes 100-499 lines, ignoring generated files. docs labels Jun 17, 2026
@github-actions

Copy link
Copy Markdown

@claude review

Comment thread content/docs/evaluation/experiments/data-model.mdx Outdated
Comment thread content/docs/evaluation/experiments/experiments-via-sdk.mdx

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f1207db0b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread components-mdx/datasets-create-dataset-item.mdx
Comment thread content/docs/evaluation/experiments/experiments-via-sdk.mdx
Comment thread components-mdx/datasets-create-dataset-item.mdx
Comment thread content/docs/evaluation/experiments/data-model.mdx Outdated
Comment thread content/docs/evaluation/experiments/datasets.mdx Outdated

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prior review comments are all resolved by recent commits and no new bugs were found — LGTM.

Extended reasoning...

Overview

Docs-only PR documenting multi-modal dataset items: a new section in datasets.mdx, a new multi-modal experiments subsection in experiments-via-sdk.mdx, new DatasetItemMediaReference reference rows in data-model.mdx, a dated changelog entry, and cross-links from the multi-modality and self-hosted blob storage pages. The embedded components-mdx/datasets-create-dataset-item.mdx snippet was also migrated from langfuse.api.datasetItems.create to langfuse.dataset.createItem.

Security risks

None — content-only .mdx changes with no runtime code, secrets, or auth-adjacent surfaces touched. The self-hosted blob storage page edits are wording-only (renaming a section header, broadening one sentence to mention datasets); no env-var defaults or example credentials changed.

Level of scrutiny

Low. Docs PRs that ship through the Next.js docs site have no runtime blast radius beyond the rendered page, and an active human author (Tobias Wochinger) is iterating on it. My prior pass already exercised the code samples against the SDK type definitions; the follow-up commits picked up each correction.

Other factors

All three of my previous inline comments are resolved: the JS/TS task signature is now async (item) => in experiments-via-sdk.mdx, the pre-existing same-bug instances in datasets.mdx and the versioned-experiments changelog were fixed in the same pass, both stale JS/TS langfuse.api.datasetItems.create call sites in datasets.mdx were migrated, and the expected_output enum value was clarified inline (the API value really is snake_case; the new wording "expected_output (for expectedOutput)" disambiguates it from the JS field name). Greptile's open suggestion about marking media as Required: No instead of Yes (nullable) is a wording preference, and Tobias explicitly pushed back on the assert isinstance suggestion for reader pedagogy — both are within author discretion and not blockers. The bug-hunting system found no new issues on the current commit.

@wochinge wochinge enabled auto-merge June 23, 2026 10:04
@dosubot dosubot Bot added the auto-merge This PR is set to be merged label Jun 23, 2026
@wochinge wochinge added this pull request to the merge queue Jun 23, 2026
Merged via the queue into main with commit a2865cd Jun 23, 2026
17 checks passed
@wochinge wochinge deleted the feature/lfe-10362-v1-docs branch June 23, 2026 10:12
@dosubot dosubot Bot removed the auto-merge This PR is set to be merged label Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant