LangSmith Tracing and Observability Guide

This guide explains how to use LangSmith for comprehensive tracing and observability of the AI Development Agent workflow.

Overview

LangSmith is LangChain's official observability platform that provides detailed tracing, monitoring, and debugging capabilities for AI workflows. The AI Development Agent system is fully integrated with LangSmith for comprehensive workflow observability.

🌐 LangSmith Dashboard

View all agent logs, workflow traces, and LLM calls at:

🔗 https://smith.langchain.com/

What You Can Monitor

Agent Executions

Input/Output Data: Complete agent input and output for each execution
Execution Time: Performance metrics for each agent
Success/Failure Status: Agent execution results and error details
Session Tracking: Complete session history across all agents

Workflow Steps

State Changes: Complete workflow state progression
Agent Transitions: How data flows between agents
Error Propagation: How errors affect the workflow
Performance Metrics: Overall workflow performance

LLM Calls

Prompt/Response Pairs: Complete LLM interactions
Model Performance: Response times and quality metrics
Token Usage: Cost and efficiency tracking
Model Selection: Which models are used for different tasks

Error Tracking

Error Context: Detailed error information with context
Stack Traces: Complete error propagation paths
Recovery Attempts: Retry logic and fallback mechanisms
Error Patterns: Identification of recurring issues

Configuration

Setup in `.streamlit/secrets.toml`

# LangSmith Configuration
LANGSMITH_TRACING = "true"
LANGSMITH_ENDPOINT = "https://api.smith.langchain.com"
LANGSMITH_API_KEY = "your-langsmith-api-key"
LANGSMITH_PROJECT = "ai-dev-agent"

Getting Your LangSmith API Key

Visit https://smith.langchain.com/
Sign up or log in to your account
Navigate to "API Keys" in your settings
Create a new API key
Add it to your .streamlit/secrets.toml file

Workflow Tracing Details

Session-Based Tracing

Each workflow execution creates a unique session with:

Session ID: Unique identifier for the entire workflow
Project Context: The original project description
Agent Sequence: Complete sequence of agent executions
Timeline: Chronological execution timeline

Agent-Level Tracing

Each agent execution is traced with:

Agent Name: Which agent is executing
Input Data: Complete input state and context
Output Data: Agent's complete output
Execution Time: Performance metrics
Error Details: Any failures or issues

LLM Call Tracing

Individual LLM calls are traced with:

Model Used: Which Gemini model was selected
Prompt Content: Complete prompt sent to the model
Response Content: Complete response received
Token Usage: Input/output token counts
Response Time: How long the call took

Using LangSmith Dashboard

Project Organization

The system creates multiple projects in LangSmith:

ai-dev-agent-agents: Individual agent executions
ai-dev-agent-workflow: Complete workflow steps
ai-dev-agent-llm: Individual LLM calls
ai-dev-agent-errors: Error tracking and debugging
ai-dev-agent-metrics: Performance metrics

Dashboard Features

Run Explorer

Filter by Agent: View executions for specific agents
Filter by Status: Success, failure, or in-progress runs
Filter by Time: Recent executions or specific time ranges
Search: Find specific runs by content or metadata

Trace View

Complete Workflow: See the entire agent sequence
State Changes: How data flows between agents
Error Points: Where failures occur in the workflow
Performance Bottlenecks: Identify slow agents or LLM calls

Prompt Management

Prompt Versioning: Track prompt changes over time
Performance Comparison: Compare prompt effectiveness
A/B Testing: Test different prompt variations
Optimization: Identify opportunities for prompt improvement

Debugging with LangSmith

Error Investigation

Find the Error: Use the error project to locate failed runs
Trace the Error: Follow the error through the workflow
Context Analysis: Examine the state when the error occurred
Root Cause: Identify the underlying cause of the failure

Performance Optimization

Identify Bottlenecks: Find slow agents or LLM calls
Token Usage Analysis: Optimize for cost and efficiency
Model Selection: Compare performance across different models
Prompt Optimization: Improve prompt effectiveness

Workflow Analysis

Agent Effectiveness: Which agents are most successful
Data Flow: How information flows between agents
State Management: How workflow state evolves
Error Patterns: Common failure points and patterns

Integration with Development

Local Development

When running locally, all traces are automatically sent to LangSmith:

Real-time Monitoring: See traces as they happen
Debugging Support: Use traces for debugging issues
Performance Tracking: Monitor local development performance

Production Deployment

For production deployments:

Environment Separation: Separate projects for different environments
Performance Monitoring: Track production performance
Error Alerting: Set up alerts for critical failures
Usage Analytics: Monitor system usage and patterns

Best Practices

Effective Tracing

Meaningful Names: Use descriptive names for runs and sessions
Rich Metadata: Include relevant context in metadata
Consistent Tagging: Use consistent tags for categorization
Error Context: Include sufficient context for error debugging

Performance Monitoring

Regular Review: Regularly review performance metrics
Trend Analysis: Track performance trends over time
Alert Setup: Set up alerts for performance degradation
Optimization: Continuously optimize based on metrics

Debugging Workflow

Start with Errors: Begin debugging by examining error traces
Follow the Data: Trace data flow through the workflow
Check Context: Examine the context when issues occur
Test Hypotheses: Use traces to test debugging hypotheses

Troubleshooting

Common Issues

No Traces Appearing

Check that LANGSMITH_API_KEY is set correctly
Verify LANGSMITH_TRACING is set to "true"
Ensure the LangSmith endpoint is accessible

Missing Agent Traces

Verify that agents are using the LangChain logging manager
Check that session IDs are being passed correctly
Ensure callback managers are properly configured

Performance Issues

Check for network connectivity to LangSmith
Verify that tracing isn't causing performance degradation
Consider reducing trace detail for high-volume operations

Getting Help

LangSmith Documentation: https://docs.smith.langchain.com/
LangSmith Community: Join the LangSmith Discord for support
Project Issues: Report LangSmith integration issues in the project repository

Benefits

For Developers

Comprehensive Debugging: Complete visibility into workflow execution
Performance Optimization: Identify and fix performance bottlenecks
Error Resolution: Quickly identify and resolve issues
Workflow Understanding: Better understanding of agent interactions

For Users

Reliability: Better error handling and debugging capabilities
Performance: Optimized workflows through performance analysis
Transparency: Complete visibility into how the system works
Quality: Improved system quality through better monitoring

For the Project

Quality Assurance: Better testing and validation capabilities
Continuous Improvement: Data-driven optimization opportunities
Documentation: Automatic documentation of workflow behavior
Collaboration: Better collaboration through shared observability

Conclusion

LangSmith provides comprehensive observability for the AI Development Agent workflow, enabling better debugging, performance optimization, and system understanding. By leveraging LangSmith's powerful tracing capabilities, developers can build more reliable and efficient AI workflows.

Visit https://smith.langchain.com/ to start exploring your workflow traces and gain deeper insights into your AI agent system.

FilesExpand file tree

langsmith_tracing_guide.md

Latest commit

History