Skip to content

feat(langchain): mark LangChain root observations in metadata#1604

Merged
hassiebp merged 3 commits intomainfrom
codex/langchain-root-metadata
Apr 1, 2026
Merged

feat(langchain): mark LangChain root observations in metadata#1604
hassiebp merged 3 commits intomainfrom
codex/langchain-root-metadata

Conversation

@hassiebp
Copy link
Copy Markdown
Contributor

@hassiebp hassiebp commented Apr 1, 2026

Summary

  • mark root LangChain observations with is_langchain_root: true in callback metadata
  • apply the flag consistently at the root callback boundary, including root chains and standalone root LLM runs
  • add focused regression tests covering a root chain and a root LLM callback

Why

Root observations created through the LangChain callback handler were not explicitly marked in observation metadata. That made it harder for downstream consumers to identify the LangChain-created root observation reliably.

Impact

Consumers can now detect root LangChain observations via the is_langchain_root metadata key without changing child observation behavior.

Validation

  • uv run pytest -q tests/test_langchain.py -k "is_langchain_root_metadata"

Disclaimer: Experimental PR review

Greptile Summary

This PR adds an is_langchain_root: True metadata flag to all root (top-level, no parent_run_id) LangChain observations, making it easier for downstream consumers to identify the entry-point span of a LangChain-driven trace without inspecting the call hierarchy.

Key changes:

  • A new private helper _get_langchain_observation_metadata wraps the existing __join_tags_and_metadata, injecting is_langchain_root: True when parent_run_id is None; child observations are unaffected.
  • The helper is applied uniformly across all four observation-creating callbacks: on_chain_start, on_tool_start, on_retriever_start, and __on_llm_action (covering both on_llm_start and on_chat_model_start).
  • Two focused unit tests validate the root chain and root LLM paths using a lightweight fake client, with no external dependencies.
  • Minor: on_tool_start merges arbitrary extra kwargs into meta after the flag is set, meaning a caller who happens to pass is_langchain_root in kwargs could silently overwrite it; the flag should ideally be applied last to be safe.
  • importlib.import_module(...) is called inside the _patch_langchain_client test helper rather than as a top-level import, which is inconsistent with the project's import convention.

Confidence Score: 5/5

  • Safe to merge; all remaining findings are P2 style/hygiene suggestions that do not affect correctness.
  • The core logic is simple and correct — is_langchain_root is injected only for root observations and never touches child spans. The two new tests exercise the changed paths with a proper fake, and the existing test suite is unchanged. Both open findings are P2: a test-only import style issue and a theoretical (not practically reachable today) flag-overwrite edge case in on_tool_start.
  • No files require special attention.

Important Files Changed

Filename Overview
langfuse/langchain/CallbackHandler.py Adds _get_langchain_observation_metadata helper that wraps __join_tags_and_metadata and injects is_langchain_root: True for all root observations (no parent_run_id); applied consistently to on_chain_start, on_tool_start, on_retriever_start, and __on_llm_action.
tests/test_langchain.py Adds two focused unit tests for the new flag using a lightweight fake client/observation; covers root chain and root LLM paths but not root tool or retriever; uses importlib.import_module inside a helper function instead of a top-level import.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["LangChain callback fires\n(on_chain_start / on_llm_start /\non_tool_start / on_retriever_start)"] --> B{"parent_run_id\nis None?"}
    B -- "Yes (root run)" --> C["__join_tags_and_metadata\n(tags, metadata)"]
    C --> D["observation_metadata\n(may be None)"]
    D --> E["root_metadata = observation_metadata.copy() or {}"]
    E --> F["root_metadata['is_langchain_root'] = True"]
    F --> G["start_observation(metadata=root_metadata)"]
    B -- "No (child run)" --> H["__join_tags_and_metadata\n(tags, metadata)"]
    H --> I["start_observation(metadata=observation_metadata)"]
Loading

Comments Outside Diff (1)

  1. langfuse/langchain/CallbackHandler.py, line 719-730 (link)

    P2 is_langchain_root can be silently overwritten by kwargs

    For on_tool_start, after _get_langchain_observation_metadata returns {"is_langchain_root": True} (root run, no tags/metadata), the meta.update(kwargs) call immediately below merges all non-None kwargs into meta. If a caller ever passes is_langchain_root among the extra keyword arguments, the flag will be silently overwritten without warning.

    While this is unlikely in practice today, setting the flag after the kwargs merge would be more resilient:

    meta = self._get_langchain_observation_metadata(
        parent_run_id=parent_run_id,
        tags=tags,
        metadata=metadata,
    )
    
    if not meta:
        meta = {}
    
    meta.update(
        {key: value for key, value in kwargs.items() if value is not None}
    )
    
    if parent_run_id is None:
        meta["is_langchain_root"] = True

    Alternatively, the _get_langchain_observation_metadata helper itself could be the canonical place — but the update call should happen first so the flag wins.

Reviews (1): Last reviewed commit: "Merge branch 'main' into codex/langchain..." | Re-trigger Greptile

Context used:

  • Rule used - Move imports to the top of the module instead of p... (source)

Learnt From
langfuse/langfuse-python#1387

@hassiebp hassiebp marked this pull request as ready for review April 1, 2026 16:39
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 1, 2026

@claude review

Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean, well-scoped change with focused tests.

Extended reasoning...

Overview

Adds a _get_langchain_observation_metadata helper to CallbackHandler.py that wraps the existing __join_tags_and_metadata call and injects is_langchain_root: True when parent_run_id is None. The helper is applied consistently at all four callback entry points: on_chain_start, on_tool_start, on_retriever_start, and __on_llm_action. Two unit tests using a lightweight fake-client pattern are added to tests/test_langchain.py.

Security risks

None. The change only writes a boolean flag into observation metadata; it does not touch auth, credentials, or any trust boundary.

Level of scrutiny

Low. This is an additive, metadata-only change that cannot break existing behaviour: consumers who do not look for is_langchain_root are unaffected, and child observations are explicitly left unchanged. The helper correctly uses .copy() to avoid mutating the dict returned by __join_tags_and_metadata.

Other factors

The two new tests directly exercise the flag for both the chain and LLM paths without requiring a live API, making them fast and reliable. The implementation follows the existing codebase style throughout.

@hassiebp hassiebp changed the title [codex] Mark LangChain root observations in metadata feat(langchain): mark LangChain root observations in metadata Apr 1, 2026
@hassiebp hassiebp merged commit 6f9eaf2 into main Apr 1, 2026
13 checks passed
@hassiebp hassiebp deleted the codex/langchain-root-metadata branch April 1, 2026 16:59
assert langchain_generation_span.input != ""
assert langchain_generation_span.output is not None
assert langchain_generation_span.output != ""
assert langchain_generation_span.metadata["is_langchain_root"] is True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The test test_callback_generated_from_trace_chat incorrectly asserts is_langchain_root on the GENERATION span rather than the CHAIN wrapper that is the actual LangChain root. When ChatOpenAI.invoke() is called, LangChain fires on_chain_start with parent_run_id=None first (creating a CHAIN wrapper), then on_chat_model_start with a non-None parent_run_id, so the flag lands on the CHAIN, not the GENERATION. The test should filter for the root CHAIN span (as test_callback_generated_from_lcel_chain correctly does) rather than asserting is_langchain_root on the GENERATION span.

Extended reasoning...

What the bug is and how it manifests

Line 70 asserts langchain_generation_span.metadata["is_langchain_root"] is True on the ChatOpenAI GENERATION span. The _get_langchain_observation_metadata helper only sets is_langchain_root=True when parent_run_id is None. When ChatOpenAI.invoke() is called, LangChain fires two callbacks: on_chain_start (with parent_run_id=None, creating a CHAIN wrapper as the LangChain root) and then on_chat_model_start (with parent_run_id=<chain_run_id>, a non-None value). So the CHAIN wrapper gets is_langchain_root=True, while the GENERATION does not.

The specific code path that triggers it

In __on_llm_action, the metadata is set via self._get_langchain_observation_metadata(parent_run_id=parent_run_id, ...). When on_chat_model_start fires for a ChatOpenAI.invoke() call, parent_run_id equals the run ID of the previously created CHAIN wrapper (not None). The helper function reaches if parent_run_id is not None: return observation_metadata without setting the is_langchain_root key.

Why existing evidence confirms this, not refutes it

The test itself already asserts len(trace.observations) == 3 (line 54), which means there are exactly 3 observations: (1) the Langfuse parent from start_as_current_observation, (2) a CHAIN wrapper created by on_chain_start, and (3) a GENERATION created by on_chat_model_start. If LangChain only fired on_chat_model_start with parent_run_id=None (as refuters claim), there would be only 2 observations total, contradicting line 54. This count of 3 is also independently confirmed by test_multimodal, which similarly calls model.invoke() directly and asserts len(trace.observations) == 3. Furthermore, git history shows commit 840cf2a explicitly changed the observation count in this test from 2 to 3, documenting the LangChain behavioral change where on_chain_start now fires before on_chat_model_start even for direct model invocations.

Step-by-step proof

  1. chat.invoke(messages, config={"callbacks": [handler]}) is called.
  2. LangChain fires on_chain_start(run_id=UUID_A, parent_run_id=None)_get_langchain_observation_metadata sees parent_run_id is None, sets is_langchain_root=True on the CHAIN observation.
  3. LangChain fires on_chat_model_start(run_id=UUID_B, parent_run_id=UUID_A)_get_langchain_observation_metadata sees parent_run_id is not None, returns metadata without is_langchain_root.
  4. The test filters for o.type == "GENERATION" and o.name == "ChatOpenAI", finding observation UUID_B.
  5. langchain_generation_span.metadata["is_langchain_root"] is either missing or None, causing the assertion on line 70 to fail (either KeyError or AssertionError).

How to fix it

Mirror the pattern used in test_callback_generated_from_lcel_chain: filter all observations for those with observation.metadata.get("is_langchain_root") and assert the single result has type == "CHAIN". Remove the incorrect line 70 assertion from test_callback_generated_from_trace_chat and replace it with a check that the CHAIN wrapper (not the GENERATION) carries the flag.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant