Skip to content

docs: add Langextract observability integration page (new content) #173

@MervinPraison

Description

@MervinPraison

Summary

The langextract observability integration is fully implemented in the core wrapper (praisonai/observability/langextract.py) and exposed via CLI (praisonai langextract view|render) and the --observe langextract flag, but it has no user-facing documentation in this repo.

This issue requests new content (not an update) — specifically a new page at docs/observability/langextract.mdx, plus two small cross-reference updates on existing pages.

Decision: Create new content — confirmed by:

  • grep -r langextract docs/ returns zero matches (only the daily-synced SDK source under praisonai/ shows up, which is not a user doc).
  • docs/observability/overview.mdx lists 20+ providers; langextract is absent from the supported-providers table.
  • docs.json Observability group (lines 720–740) has no langextract entry.

Source PRs (SDK ground truth)

  • Feature introduction: MervinPraison/PraisonAI#1413 (langextract TraceSinkProtocol adapter + CLI commands)
  • Bug fix / event-capture bridge: MervinPraison/PraisonAI#1420 — merged 2026-04-17 on main @ 2bd0287828a1bf055d35be363743c8333cbaa1c6
    • Adds _ContextToActionBridge so normal Agent.start() flows emit events (before this fix, only RouterAgent / PlanningAgent events were captured, so render produced an empty HTML).

What is langextract?

An optional observability integration that turns a PraisonAI agent run into a self-contained interactive HTML trace, grounded in the original input text. Unlike hosted providers (Langfuse, LangSmith, etc.), it writes two local files:

  • <name>.jsonl — annotated-documents JSONL (langextract format)
  • <name>.html — interactive visualization (opens in browser)

Extractions produced per run:

Event Extraction class Grounded
AGENT_START agent_run First 200 chars of input
TOOL_START tool_call No (ungrounded)
TOOL_END tool_result No
OUTPUT final_output First 1000 chars of output
ERROR error No

SDK ground truth — files to read before writing docs

Per AGENTS.md §1.1 / §1.3 (SDK-first), read these before authoring the page:

Concern File (in praisonai/ synced tree of this repo)
Sink + config dataclass praisonai/observability/langextract.py
Package export (lazy) praisonai/observability/__init__.py
view / render CLI praisonai/cli/commands/langextract.py
--observe langextract wiring praisonai/cli/app.py_setup_langextract_observability
Context-event bridge (added in #1420) praisonai/observability/langextract.py_ContextToActionBridge, LangextractSink.context_sink(), LangextractSink.bridge_context_events()

Configuration options (extract verbatim — do not guess)

From @dataclass LangextractSinkConfig in praisonai/observability/langextract.py:

Option Type Default Description
output_path str "praisonai-trace.html" HTML file written on close()
jsonl_path Optional[str] None (derived from output_path) Annotated-documents JSONL path
document_id str "praisonai-run" Document ID in the JSONL
auto_open bool False Open the HTML in a browser after render
include_llm_content bool True Include response text in attributes
include_tool_args bool True Include tool args in attributes
enabled bool True Master switch

Install

pip install 'praisonai[langextract]'

User-facing entry points to document

  1. CLI — render a YAML workflow
    praisonai langextract render workflow.yaml -o trace.html
    praisonai langextract render workflow.yaml -o trace.html --no-open
  2. CLI — view an existing JSONL
    praisonai langextract view trace.jsonl -o trace.html
  3. CLI — instrument any praisonai run
    praisonai --observe langextract agents.yaml
  4. Programmatic (agent-centric — lead with this, per AGENTS.md §1.1.9)
    from praisonaiagents import Agent
    from praisonaiagents.trace.protocol import TraceEmitter, set_default_emitter
    from praisonai.observability import LangextractSink, LangextractSinkConfig
    
    sink = LangextractSink(LangextractSinkConfig(output_path="trace.html", auto_open=True))
    set_default_emitter(TraceEmitter(sink=sink, enabled=True))
    LangextractSink.bridge_context_events(sink=sink, session_id="my-run")  # captures Agent.start/tool/llm
    
    agent = Agent(name="Writer", instructions="Write a haiku about code.")
    agent.start("Write a haiku about code.")
    
    sink.close()  # writes trace.jsonl + trace.html

The bridge_context_events(...) call is required for typical single-agent flows. Without it, only RouterAgent token-usage and PlanningAgent.plan_created events are captured and the HTML will be empty. This is the exact gap fixed by #1420 — please lead with the bridged example and only mention the un-bridged path as a footnote.

Files to create / modify

1. Create docs/observability/langextract.mdx (NEW)

Must follow AGENTS.md §2 page template exactly. Suggested section skeleton:

---
title: "Langextract"
sidebarTitle: "Langextract"
description: "Render PraisonAI agent runs as self-contained interactive HTML traces grounded in the input text"
icon: "file-code"
---

<intro sentence>

<hero mermaid — see AGENTS.md §3.3, LR graph: Agent → Sink → JSONL+HTML → Browser>

## Quick Start
  <Steps>
    <Step title="Install">  pip install 'praisonai[langextract]'  </Step>
    <Step title="Agent-centric (recommended)">  … code with bridge_context_events …  </Step>
    <Step title="CLI — render a workflow">  praisonai langextract render …  </Step>
  </Steps>

## How It Works
  sequenceDiagram: User → CLI/Agent → LangextractSink → ContextTraceEmitter (bridged) → JSONL → HTML

## Configuration Options
  <full LangextractSinkConfig table above>

## CLI Reference
  - praisonai langextract render <yaml> [-o FILE] [--no-open] [--api-url URL]
  - praisonai langextract view <jsonl> [-o FILE] [--no-open]
  - praisonai --observe langextract <agents.yaml>

## Extraction Mapping
  <event → extraction_class table above>

## Common Patterns
  - Single-agent Agent.start() with bridge
  - YAML workflow via render
  - Post-hoc: re-render an existing JSONL with view

## Troubleshooting
  - "Trace was not rendered" / empty HTML → ensure bridge_context_events(...) is called (fixed in PR #1420)
  - ImportError → pip install 'praisonai[langextract]'

## Best Practices (AccordionGroup)
  - Always call sink.close() (render happens there); CLI commands do this automatically
  - Use auto_open=False in CI
  - Scope the emitter: restore the previous emitter after your run

## Related (CardGroup)
  - /observability/overview
  - /observability/langfuse   (hosted alternative)

Mermaid colours must follow AGENTS.md §3.1.

2. Update docs/observability/overview.mdx (small edit)

Add one row to the supported-providers table (around line 50):

| [Langextract](/observability/langextract) | – (local HTML) | `pip install 'praisonai[langextract]'` |

Consider noting in a small callout that langextract is a local file sink (HTML + JSONL) rather than a hosted backend — it’s qualitatively different from the rest of the table.

3. Update docs.json (small edit)

Insert "docs/observability/langextract" into the Observability group, between lines 720–740 (natural spot is right after overview). Keep alphabetical / logical ordering consistent with neighbours. Verify JSON remains valid after the edit (AGENTS.md §1.9).

Placement rules (AGENTS.md §1.8)

  • docs/observability/ is agent-writable.
  • ❌ Do not touch docs/concepts/ — this feature is a provider integration, not a core concept.
  • ❌ Do not touch docs/js/ or docs/rust/ — langextract is Python-only; those trees are auto-generated.

Acceptance checklist (per AGENTS.md §9)

  • docs/observability/langextract.mdx created and renders in Mintlify
  • Frontmatter complete (title, sidebarTitle, description, icon)
  • Hero Mermaid diagram present using the standard colour scheme
  • <Steps>, <AccordionGroup>, <CardGroup> all used
  • Every LangextractSinkConfig field documented with exact type + default from source
  • Agent-centric code example leads (AGENTS.md §1.1.9) and includes bridge_context_events(...)
  • CLI section covers all three surfaces: render, view, --observe langextract
  • Troubleshooting section calls out the PR #1420 empty-trace failure mode
  • docs/observability/overview.mdx updated with a langextract row
  • docs.json updated with docs/observability/langextract; JSON validates
  • No edits to docs/concepts/, docs/js/, or docs/rust/
  • All code examples runnable copy-paste (AGENTS.md §5.1)

References

  • PraisonAI PR #1413 (feature introduction)
  • PraisonAI PR #1420 (bugfix wiring the context emitter bridge — merged on main)
  • Source-of-truth files listed in the “SDK ground truth” table above

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingclaudeTrigger Claude Code analysisdocumentationImprovements or additions to documentationnew-contentobservability

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions