Skip to content

docs: clarify OpenAI Python parse vs response_format guidance#2884

Open
jannikmaierhoefer wants to merge 1 commit intomainfrom
claude/quirky-montalcini-52d792
Open

docs: clarify OpenAI Python parse vs response_format guidance#2884
jannikmaierhoefer wants to merge 1 commit intomainfrom
claude/quirky-montalcini-52d792

Conversation

@jannikmaierhoefer
Copy link
Copy Markdown
Member

@jannikmaierhoefer jannikmaierhoefer commented Apr 30, 2026

Summary

  • Update the OpenAI Python integration page and the structured output cookbook to recommend client.chat.completions.parse(...) for openai>=1.92.0 and scope the beta caveat to older SDK versions.
  • Note that the Langfuse Python SDK instruments both the stable (openai.resources.chat.completions.Completions.parse) and the legacy beta path, so Langfuse attributes (name, metadata, langfuse_session_id, …) work on either.
  • Keep the response_format + type_to_response_format_param example as a fallback for users who cannot upgrade openai.

Why

Reported by David Traina (Ramp) in Pylon #1339. OpenAI moved parse/stream out of beta in openai-python v1.92.0 ~10 months ago, but our docs still warned against the beta API and pushed users to response_format + chat.completions.create. The SDK has already supported the stable path for a while — only the docs were stale.

Test plan

  • pnpm dev and verify the Structured Output section on /integrations/model-providers/openai-py renders correctly.
  • Verify the regenerated /guides/cookbook/integration_openai_structured_output page renders the updated note and parse example.

🤖 Generated with Claude Code

Disclaimer: Experimental PR review

Greptile Summary

This PR corrects stale documentation that incorrectly told users to avoid client.chat.completions.parse in favour of response_format + create. It updates both the integration page and the cookbook to recommend the stable parse API (available since openai-python v1.92.0) and preserves a type_to_response_format_param fallback for users who cannot upgrade.

Confidence Score: 4/5

Safe to merge — documentation-only changes with accurate technical content and only minor style observations.

All three files are docs/notebook updates with no runtime code. The guidance is factually correct. The only findings are P2: a private-API import risk in the legacy fallback (pre-existing pattern, not introduced here) and mildly ambiguous phrasing in one note.

No files require special attention; the private import in openai-py.mdx is worth a comment but is not blocking.

Important Files Changed

Filename Overview
content/integrations/model-providers/openai-py.mdx Structured Output section rewritten to recommend the stable parse API (openai>=1.92.0) and retain type_to_response_format_param as a legacy fallback; the fallback imports from a private internal module (openai.lib._parsing._completions).
content/guides/cookbook/integration_openai_structured_output.md Note updated to clarify both parse paths are instrumented; Alternative section switched from client.beta.chat.completions.parse to stable client.chat.completions.parse with a Langfuse name attribute; phrasing in the note is slightly ambiguous.
cookbook/integration_openai_structured_output.ipynb Notebook reformatted to 1-space indentation and updated to mirror the .md changes: stable parse path, name attribute added, old output cells preserved.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User wants Structured Output\nwith Langfuse tracing] --> B{openai SDK version?}
    B -- ">=1.92.0\n(recommended)" --> C["client.chat.completions.parse(...)\nresponse_format=PydanticModel\nname='...' metadata={...}"]
    B -- "<1.92.0\n(legacy)" --> D{Pydantic model needed?}
    D -- Yes --> E["client.beta.chat.completions.parse(...)\nresponse_format=PydanticModel\n(re-routed to stable on >=1.92.0)"]
    D -- No / can't upgrade --> F["type_to_response_format_param(Model)\n→ client.chat.completions.create(...)\nresponse_format=schema_dict"]
    C --> G[Langfuse traces both name\nand metadata attributes ✓]
    E --> G
    F --> G
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
content/integrations/model-providers/openai-py.mdx:101
**Private internal API in legacy fallback**

`openai.lib._parsing._completions` is an underscore-prefixed internal module — it is not part of OpenAI's public API surface and can be removed or renamed without a semver-breaking release. Users who follow this fallback path are silently depending on an implementation detail that could break on any minor OpenAI SDK bump, even within `<1.92.0`. Consider noting this risk explicitly, or suggesting users pin their OpenAI version when using this path.

### Issue 2 of 2
content/guides/cookbook/integration_openai_structured_output.md:58
**Slightly ambiguous parenthetical in the note**

The clause "for older SDK versions, where `parse` is re-routed to the stable method on newer SDKs" embeds a forward-reference to newer-SDK behaviour inside the description of the older-SDK path, which can read as contradictory. Consider splitting the two facts into separate sentences, e.g. "…the legacy `client.beta.chat.completions.parse(...)` (available on `openai<1.92.0`). On `openai>=1.92.0` the OpenAI SDK re-routes beta calls to the stable method, so either path reaches the same instrumented function."

Reviews (1): Last reviewed commit: "docs: clarify OpenAI Python parse vs res..." | Re-trigger Greptile

openai-python>=1.92.0 graduated parse/stream out of beta. The Langfuse
SDK already instruments both client.chat.completions.parse (stable) and
client.beta.chat.completions.parse (legacy), so the previous "use
response_format with chat.completions.create instead of the Beta API"
guidance was outdated. Update the OpenAI Python integration page and
the structured output cookbook to recommend the stable parse helper and
scope the beta caveat to older SDKs.

Reported by David Traina (Ramp) via support.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Langfuse-Session: https://cloud.langfuse.com/project/cloramnkj0002jz088vzn1ja4/sessions/b55454d8-1e5e-477d-a6ba-f81f65fe959c
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
langfuse-docs Ready Ready Preview, Comment Apr 30, 2026 6:55am

Request Review

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 30, 2026
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link
Copy Markdown

@claude review

@dosubot dosubot Bot added the documentation Improvements or additions to documentation label Apr 30, 2026
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs-only clarification of the OpenAI Python parse vs response_format guidance — the inline nits are minor wording suggestions and the underlying changes are correct.

Extended reasoning...

Overview

This PR updates the OpenAI Python integration docs (content/integrations/model-providers/openai-py.mdx) and the matching structured-output cookbook (.md and .ipynb) to recommend client.chat.completions.parse(...) for openai>=1.92.0, scope the beta caveat to older SDK versions, and keep the response_format + type_to_response_format_param example as a fallback. It also adds a new parse example with name/metadata Langfuse attributes.

Security risks

None. This change touches only Markdown/MDX/notebook content with no runtime, auth, or user-input handling implications.

Level of scrutiny

Low — documentation-only, no code paths or configuration are affected. The factual claim being introduced (parse graduated out of beta in openai-python v1.92.0 and the Langfuse SDK instruments both the stable and beta parse paths) is consistent with the linked release notes and the rest of the integration docs.

Other factors

The two inline nits posted are wording-level: (1) the new bullet groups langfuse_session_id alongside direct kwargs even though it is a metadata key, and (2) the #### Structured Output subsection now lives under an ### OpenAI Beta APIs parent whose intro still says beta APIs require manual @observe() wrapping. Neither is incorrect documentation per se — the canonical 'Custom trace properties' table and a correct metadata={...} example are right above and below the new prose — and a Vercel preview is already building for visual verification. These are the kind of small editorial tweaks a maintainer can take or leave; they don't gate approval.

Comment on lines +386 to +387
- **`openai>=1.92.0` (recommended):** use `client.chat.completions.parse(...)`. OpenAI graduated `parse` and `stream` out of beta in [v1.92.0](https://github.com/openai/openai-python/releases/tag/v1.92.0), and Langfuse wraps the stable `openai.resources.chat.completions.Completions.parse` (and the async variant). You can pass a Pydantic model directly via `response_format` and still set Langfuse attributes such as `name`, `metadata`, `langfuse_session_id`, etc.
- **`openai<1.92.0` (legacy):** the parse helper is only available under `client.beta.chat.completions.parse(...)`. Langfuse also wraps the beta path on these older versions, so attributes like `name` and `metadata` work there too.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The new note groups name, metadata, and langfuse_session_id together as 'Langfuse attributes' (line 386 and again in the cookbook .md/.ipynb), implying all three are direct kwargs of chat.completions.parse(...). However, the same MDX file's 'Custom trace properties' table (lines 206-211) lists only name/metadata/trace_id/parent_observation_id as direct kwargs, and the 'Setting trace attributes' section (line 230) shows that langfuse_session_id, langfuse_user_id, and langfuse_tags must live inside the metadata dict. A reader could try parse(..., langfuse_session_id='x') and have it silently dropped. Suggest rewording to something like 'attributes such as name and metadata (with langfuse_session_id, langfuse_user_id, langfuse_tags etc. nested inside metadata)' — note the existing example at line 416 already demonstrates the correct pattern with metadata={"langfuse_tags": [...]}.

Extended reasoning...

The bug. The new prose at content/integrations/model-providers/openai-py.mdx lines 386-387 (and the same wording duplicated in content/guides/cookbook/integration_openai_structured_output.md at lines 58 and 193, plus the matching .ipynb cells) reads:

You can pass a Pydantic model directly via response_format and still set Langfuse attributes such as name, metadata, langfuse_session_id, etc.

This sentence flattens two different things into one comma-separated list: name and metadata are real keyword arguments accepted by the Langfuse-wrapped OpenAI call, but langfuse_session_id is not — it is a key inside the metadata dict.

Why this contradicts the rest of the file. The same MDX has a 'Custom trace properties' table at lines 204-211 that lists exactly four direct kwargs: name, metadata, trace_id, parent_observation_id. langfuse_session_id is deliberately absent from that table. The 'Setting trace attributes (session_id, user_id, tags)' section that follows (lines 213-236) makes the correct usage explicit:

metadata={
    "langfuse_session_id": "session_123",
    "langfuse_user_id": "user_456",
    "langfuse_tags": ["calculator"],
    ...
}

So the new prose contradicts the canonical documentation just 150 lines above it.

The new prose also contradicts its own example. The very next code block (line 416 in the MDX) does the right thing:

completion = openai.chat.completions.parse(
    ...,
    name="extract-calendar-event",
    metadata={"langfuse_tags": ["structured-output"]},
)

Here langfuse_tags is correctly nested inside metadata, not passed as a direct kwarg. The example demonstrates the correct pattern; the prose above it does not.

Step-by-step proof of how a reader gets misled.

  1. Reader lands on the new 'Structured Output' section because they want session/user attribution on a parse call.

  2. They read the bullet: "You can pass a Pydantic model directly via response_format and still set Langfuse attributes such as name, metadata, langfuse_session_id, etc."

  3. By analogy with name="..." (a direct kwarg shown in the example), they write:

    completion = openai.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[...],
        response_format=CalendarEvent,
        name="extract-calendar-event",
        langfuse_session_id="session_abc",   # ← reader infers this from the prose
    )
  4. The wrapper recognizes name and consumes it. langfuse_session_id is not a recognized wrapper kwarg, so it is either passed through to the underlying OpenAI client (which will reject it as an unknown parameter, or silently ignore depending on version), or simply dropped — in any case, it never makes it onto the trace as a session id.

  5. The reader's traces have no session id and they don't know why; the docs that would tell them (lines 213-236) are now contradicted by the new bullet they trusted.

Cross-file impact. The cookbook .md at lines 58 and 193 has the same sentence, but in that file the disambiguating 'Custom trace properties' / 'Setting trace attributes' sections do not exist nearby — so a cookbook reader has even less chance of catching the issue. The .ipynb carries the same wording.

Fix. Reword along the lines of:

...Both paths support Langfuse attributes such as name and metadata. To attach a session id, user id, or tags, set them as keys inside metadata (langfuse_session_id, langfuse_user_id, langfuse_tags).

This keeps the reassurance that the stable parse path is fully instrumented, while no longer presenting metadata-keys as if they were direct kwargs. The change is text-only and applies to: the new note in the MDX (around line 386), the same note in the cookbook .md (line 58 and 193), and the corresponding cells in the .ipynb.

Comment on lines +384 to +388
For **structured output parsing**, you have two fully instrumented options depending on your `openai` Python SDK version:

If you rely on parsing Pydantic defintions for your `response_format`, you may leverage the `type_to_response_format_param` utility function from the OpenAI Python SDK to convert the Pydantic definition to a `response_format` dictionary. This is the same function the OpenAI Beta API uses to convert Pydantic definitions to `response_format` dictionaries.
- **`openai>=1.92.0` (recommended):** use `client.chat.completions.parse(...)`. OpenAI graduated `parse` and `stream` out of beta in [v1.92.0](https://github.com/openai/openai-python/releases/tag/v1.92.0), and Langfuse wraps the stable `openai.resources.chat.completions.Completions.parse` (and the async variant). You can pass a Pydantic model directly via `response_format` and still set Langfuse attributes such as `name`, `metadata`, `langfuse_session_id`, etc.
- **`openai<1.92.0` (legacy):** the parse helper is only available under `client.beta.chat.completions.parse(...)`. Langfuse also wraps the beta path on these older versions, so attributes like `name` and `metadata` work there too.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The parent OpenAI Beta APIs section intro (just above line 381) still says "we fully support only the stable APIs in the OpenAI SDK. If you are using a beta API, you can still use the Langfuse SDK by wrapping the OpenAI SDK manually with the @observe() decorator", but the new Structured Output subsection placed under it now explicitly states that Langfuse instruments the legacy beta parse path too. Readers following the parent intro would unnecessarily wrap with @observe(). Consider updating the intro paragraph or moving Structured Output out from under the OpenAI Beta APIs header (since parse is no longer beta on openai>=1.92.0).

Extended reasoning...

What the bug is. This PR rewrites the Structured Output subsection of content/integrations/model-providers/openai-py.mdx to explain that Langfuse instruments both the stable client.chat.completions.parse(...) (on openai>=1.92.0) and the legacy client.beta.chat.completions.parse(...) (on older SDKs). However, that subsection still lives under the parent ### OpenAI Beta APIs header, whose intro paragraph (untouched by this PR) reads:

Since OpenAI beta APIs are changing frequently across versions, we fully support only the stable APIs in the OpenAI SDK. If you are using a beta API, you can still use the Langfuse SDK by wrapping the OpenAI SDK manually with the @observe() decorator.

The new subsection directly contradicts that intro by stating:

openai<1.92.0 (legacy): the parse helper is only available under client.beta.chat.completions.parse(...). Langfuse also wraps the beta path on these older versions, so attributes like name and metadata work there too.

Why it's a problem. A reader who scans the page top-down hits the section header ("OpenAI Beta APIs"), reads the intro that says beta APIs need manual @observe() wrapping, and may stop reading there or carry that mental model forward. Even readers who continue then see the opposite claim a few lines later. Both messages can't be true at once.

Section structure is also stale. Since parse graduated out of beta in openai-python v1.92.0 (the very point this PR is making), the Structured Output content no longer belongs under an "OpenAI Beta APIs" header at all. The recommended path (client.chat.completions.parse(...)) is stable, not beta.

Step-by-step proof.

  1. Open the rendered page /integrations/model-providers/openai-py after this PR.
  2. Scroll to the ### OpenAI Beta APIs heading and read the intro paragraph: it tells the user that only stable APIs are fully supported and that beta APIs require manual @observe() wrapping.
  3. Continue into the #### Structured Output subsection just below: bullet two explicitly says Langfuse "also wraps the beta path on these older versions, so attributes like name and metadata work there too."
  4. The two statements are mutually exclusive — either the legacy beta parse path is automatically instrumented (so no @observe() is needed) or it isn't. The PR makes the new claim true but leaves the contradicting intro in place.

How to fix. Either (a) update the intro paragraph under ### OpenAI Beta APIs to clarify that some beta endpoints are now instrumented (and call out Structured Output as an example), or (b) promote #### Structured Output to its own top-level section (e.g., under "Advanced usage") since parse is no longer a beta API on supported SDK versions. Option (b) probably ages better given that parse is the recommended stable path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant