Week 10: Reliability, Monitoring & LLM Ops
Part of the LLM Engineering & Deployment Certification Program
This repository contains code examples for monitoring, observability, cost management, and security of LLM systems in production. The module covers:
- LangFuse - Open-source LLM tracing and observability
- LangSmith - LangChain's tracing and debugging platform
- LiteLLM - Unified proxy for cost tracking, alerting, and multi-provider routing
- Bifrost - LLM gateway for model switching and load balancing
- CloudWatch - AWS infrastructure monitoring for Bedrock
- Python 3.10+
- Docker and Docker Compose
- OpenAI API key
- Anthropic API key (optional)
- LangFuse account (free self-hosted or cloud)
- LangSmith account (free tier available)
Create a virtual environment:
python -m venv venvActivate the virtual environment:
# On Windows:
venv\Scripts\activate
# On Mac/Linux:
source venv/bin/activateInstall all dependencies:
pip install -r requirements.txtCreate a .env file in the root directory:
# LLM Providers
OPENAI_API_KEY=your-openai-api-key
ANTHROPIC_API_KEY=your-anthropic-api-key
# LangFuse (self-hosted or cloud)
LANGFUSE_SECRET_KEY=sk-lf-xxxxx
LANGFUSE_PUBLIC_KEY=pk-lf-xxxxx
LANGFUSE_BASE_URL=http://localhost:3000 # or https://cloud.langfuse.com
# LangSmith
LANGSMITH_API_KEY=lsv2_pt_xxxxx
LANGSMITH_PROJECT=your-project-name
LANGSMITH_TRACING=trueDemonstrates LLM observability with the @observe() decorator for automatic tracing.
Run the examples:
python code/langfuse_tracing.pyWhat it demonstrates:
| Example | Description |
|---|---|
| Simple Question | Single LLM call with automatic tracing |
| Extract Keywords | LLM call + Python post-processing |
| Content Pipeline | Multi-step chain (Draft → Critique → Refine) |
View traces: Open your LangFuse dashboard to see the traces with latency, token usage, and nested spans.
Demonstrates LLM tracing with the @traceable() decorator and wrap_openai() for the OpenAI SDK.
Run the examples:
python code/langsmith_tracing.pyWhat it demonstrates:
| Example | Description |
|---|---|
| Simple Question | Single LLM call with automatic tracing |
| Extract Keywords | LLM call + Python post-processing |
| Content Pipeline | Multi-step chain with metadata |
View traces: Open the LangSmith dashboard to see traces, latency, and token usage.
Unified LLM gateway with cost tracking, budget enforcement, and multi-provider support.
Start the proxy:
cd code/litellm
docker compose up -dAccess the UI: Open http://localhost:4000/ui
Test with client:
python code/litellm/client.pyKey features:
| Feature | Description |
|---|---|
| Unified API | One endpoint for OpenAI, Anthropic, local models |
| Cost Tracking | Automatic token counting and spend logging |
| Budget Enforcement | Set spending limits per virtual key |
| Custom Pricing | Define per-token costs for self-hosted models |
| Alerting | Slack/email alerts for budget thresholds |
Configuration: Edit code/litellm/litellm_config.yaml to add models:
model_list:
- model_name: gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
api_key: os.environ/OPENAI_API_KEY
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_API_KEYCleanup:
cd code/litellm
docker compose downLLM gateway with model switching and OpenAI-compatible API.
Start the gateway:
docker run -p 8080:8080 maximhq/bifrostThis starts Bifrost on port 8080. Configure providers via the web UI at http://localhost:8080.
Run the interactive client:
python code/bifrost/client.pyClient commands:
| Command | Description |
|---|---|
/model or /switch |
Switch to a different model mid-conversation |
/models or /list |
List all available models |
/current |
Show the current model |
/help |
Show available commands |
quit or exit |
Exit the chat |
Stop the gateway:
docker stop $(docker ps -q --filter ancestor=maximhq/bifrost)Demonstrates publishing custom metrics (like TTFT) to AWS CloudWatch for Bedrock models.
Prerequisites:
- AWS credentials configured (via
.envor AWS CLI) - A deployed model in AWS Bedrock
Run the example:
python code/cloudwatch_client.pyWhat it demonstrates:
| Feature | Description |
|---|---|
| TTFT Measurement | Measures Time to First Token using streaming |
| Custom Metrics | Publishes to Bedrock/Custom namespace |
| Bedrock Streaming | Uses invoke_model_with_response_stream |
Environment variables required:
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_SESSION_TOKEN=your-session-token # if using temporary credentialsView metrics in CloudWatch:
- Go to CloudWatch → Metrics → All metrics
- Look for
Bedrock/Customunder Custom namespaces - Select
ModelId→TTFT
| # | Lesson | Topic | Key Concepts |
|---|---|---|---|
| - | Overview | Unit Introduction | LLMOps feedback loop, unit structure |
| 1 | Monitoring Fundamentals | Core Metrics | TTFT, TPS, latency percentiles, throughput |
| 2 | Observability & Tracing | Structured Logging | Logs, metrics, traces, spans, PII handling |
| 3 | LangFuse | Hands-On Tool | @observe() decorator, nested traces, metadata |
| 4 | LangSmith | Hands-On Tool | @traceable(), wrap_openai(), Playground |
| 5 | Alerting & Incident Response | Operations | Thresholds, runbooks, on-call, post-mortems |
| 6 | Cost Monitoring | Optimization | LLM cascading, caching, speculative decoding |
| 7 | LiteLLM | Hands-On Tool | Proxy setup, cost tracking, budget enforcement |
| 8 | Bifrost | Hands-On Tool | Model gateway, routing, load balancing |
| 9 | CloudWatch | AWS Monitoring | Bedrock/SageMaker metrics, dashboards, alarms |
| 10 | Drift Detection | Quality Monitoring | Data drift, LLM-as-Judge, semantic monitoring |
| 11 | Security | Protection | Prompt injection, guardrails, PII redaction |
rt-llm-eng-cert-week10/
├── code/
│ ├── langfuse_tracing.py # LangFuse tracing examples
│ ├── langsmith_tracing.py # LangSmith tracing examples
│ ├── cloudwatch_client.py # CloudWatch custom metrics (TTFT)
│ ├── litellm/
│ │ ├── docker-compose.yaml # LiteLLM + PostgreSQL setup
│ │ ├── litellm_config.yaml # Model configuration
│ │ └── client.py # Interactive client (with streaming)
│ └── bifrost/
│ ├── docker-compose.yaml # Bifrost gateway setup
│ ├── config.json # Provider configuration
│ └── client.py # Interactive client with model switching
├── requirements.txt # Python dependencies
├── .env # Environment variables (create this)
└── README.md
| Tool | Purpose | Self-Hosted | Cost Tracking | Tracing |
|---|---|---|---|---|
| LangFuse | LLM Observability | ✅ Yes | ✅ Yes | ✅ Yes |
| LangSmith | LLM Debugging | ❌ No (cloud only) | ✅ Yes | ✅ Yes |
| LiteLLM | Unified Proxy | ✅ Yes | ✅ Yes (custom pricing) | Via integrations |
| Bifrost | LLM Gateway | ✅ Yes | ✅ Yes (built-in catalog) | ❌ Limited |
Choose LangFuse if:
- You want full control with self-hosting
- You need open-source with no vendor lock-in
- You want to build evaluation datasets from production traces
Choose LangSmith if:
- You're already using LangChain
- You want a polished UI with Playground features
- You prefer a managed cloud solution
Choose LiteLLM if:
- You need a unified API for multiple providers
- Custom token pricing for self-hosted models is important
- You want budget enforcement and alerting
Choose Bifrost if:
- You need high-performance routing (< 100µs overhead)
- You want request-type restrictions per provider
- You're building a Go-based infrastructure
- TTFT (Time to First Token) - User-perceived latency
- TPS (Tokens per Second) - Generation speed
- Throughput - Requests per second capacity
- Queue Depth - Pending request backlog
- LLM Cascading - Cheap model first, escalate on low confidence
- Prompt Caching - Exact match, semantic, and prefix caching
- Speculative Decoding - Draft model + verification
- Quantization - INT4/INT8 for reduced GPU requirements
- Two-LLM Pattern - Gatekeeper model for intent detection
- Guardrails - Input/output validation and filtering
- PII Redaction - Strip sensitive data before API calls
- Agent Security - Deterministic tool contracts, allowlists
This work is licensed under CC BY-NC-SA 4.0.
You are free to:
- Share and adapt this material for non-commercial purposes
- Must give appropriate credit and indicate changes made
- Must distribute adaptations under the same license
See LICENSE for full terms.
For questions or issues related to this repository, please refer to the course materials or contact your instructor.