Skip to content

Commit 28cdf9f

Browse files
[LEADS-362] Add Langfuse as a storage backend
Integrate Langfuse as a pluggable storage backend using the existing Changes: - Add LangfuseBackendConfig to core/storage/config.py - Wire langfuse type in core/system/loader.py - Add LangfuseBackendConfig handling in core/storage/factory.py - Add LangfuseStorageBackend in core/storage/langfuse_storage.py - Add langfuse optional dependency to pyproject.toml - Add langfuse config example to config/system.yaml Co-authored-by: Cursor <cursoragent@cursor.com>
1 parent e8b5f96 commit 28cdf9f

10 files changed

Lines changed: 578 additions & 3 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -253,6 +253,7 @@ The `storage` list in `system.yaml` selects where results go. You can use **seve
253253
|--------|---------|
254254
| **File backend** | CSV, JSON, and TXT under `output_dir`; tune formats with `enabled_outputs` and columns with `csv_columns`. |
255255
| **Database (optional)** | SQLite, PostgreSQL, or MySQL for querying and analysis; writes are **incremental** per conversation; storage errors are logged as warnings and **do not** abort the evaluation run. |
256+
| **Langfuse (optional)** | Export evaluation scores to [Langfuse](https://langfuse.com) for observability and analytics. Creates one trace per run with per-metric numeric scores. Requires `pip install 'lightspeed-evaluation[langfuse]'`. |
256257

257258
For field tables, full YAML examples (file-only, file + SQLite, file + Postgres), CSV column reference, and notes on API token columns, see **[Storage](docs/configuration.md#storage)** in the configuration guide.
258259

config/system.yaml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -295,6 +295,13 @@ storage:
295295
# database: "./eval_results.db"
296296
# table_name: "evaluation_results"
297297

298+
# Langfuse backend (optional) - export scores to Langfuse observability platform
299+
# Requires: pip install 'lightspeed-evaluation[langfuse]'
300+
# Credentials via env vars: LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST
301+
# Or provide them inline below:
302+
# - type: "langfuse"
303+
# host: "https://cloud.langfuse.com"
304+
298305
# Visualization settings
299306
visualization:
300307
figsize: [12, 8] # Graph size (width, height)

docs/configuration.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -400,6 +400,53 @@ Save results to a database for querying and analysis. Supports SQLite, PostgreSQ
400400

401401
> **Note:** Database storage is incremental - results are saved as each conversation completes. Storage failures are logged as warnings but don't stop the evaluation.
402402

403+
### Langfuse Backend (Optional)
404+
Export evaluation scores to [Langfuse](https://langfuse.com) for observability, analytics, and score tracking. Creates one trace per evaluation run with one numeric score per metric result.
405+
406+
Requires the Langfuse SDK v4:
407+
```bash
408+
# Using pip
409+
pip install 'lightspeed-evaluation[langfuse]'
410+
411+
# Using uv
412+
uv sync --extra langfuse
413+
```
414+
415+
| Setting (storage[type="langfuse"].) | Default | Description |
416+
|-------------------------------------|---------|-------------|
417+
| type | `"langfuse"` | Backend type (required) |
418+
| host | `null` | Langfuse API host URL (falls back to `LANGFUSE_HOST` env var) |
419+
| public_key | `null` | Langfuse public key (falls back to `LANGFUSE_PUBLIC_KEY` env var) |
420+
| secret_key | `null` | Langfuse secret key (falls back to `LANGFUSE_SECRET_KEY` env var) |
421+
422+
> **Credentials:** Configure credentials via environment variables (`LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, `LANGFUSE_HOST`) or inline in the YAML config. Environment variables are the recommended approach — inline config fields take precedence when set.
423+
424+
> **Score handling:** Results with a numeric score (PASS/FAIL) are exported as `NUMERIC` scores. Results without a score (`score=None`, e.g. ERROR/SKIPPED) are skipped. All Langfuse errors are logged but never abort the evaluation.
425+
426+
### Example: Langfuse via Environment Variables
427+
```yaml
428+
storage:
429+
- type: "file"
430+
output_dir: "./eval_output"
431+
- type: "langfuse"
432+
```
433+
```bash
434+
export LANGFUSE_PUBLIC_KEY="pk-lf-..."
435+
export LANGFUSE_SECRET_KEY="sk-lf-..."
436+
export LANGFUSE_HOST="https://cloud.langfuse.com"
437+
```
438+
439+
### Example: Langfuse with Inline Credentials
440+
```yaml
441+
storage:
442+
- type: "file"
443+
output_dir: "./eval_output"
444+
- type: "langfuse"
445+
host: "https://cloud.langfuse.com"
446+
public_key: "pk-lf-..."
447+
secret_key: "sk-lf-..."
448+
```
449+
403450
### Output types
404451

405452
| Output type (in `enabled_outputs`) | Description |

pyproject.toml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,15 @@ nlp-metrics = [
5353
"rapidfuzz>=3.0.0,<=3.14.3", # Required for semantic_similarity_distance
5454
]
5555

56+
# Langfuse observability - export evaluation scores to Langfuse
57+
# Install with:
58+
# pip install 'lightspeed-evaluation[langfuse]'
59+
# or
60+
# uv sync --extra langfuse
61+
langfuse = [
62+
"langfuse>=4.0.0,<5.0.0",
63+
]
64+
5665
[dependency-groups]
5766
dev = [
5867
"bandit>=1.7.0,<=1.9.2",

src/lightspeed_evaluation/core/storage/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@
3232
from lightspeed_evaluation.core.storage.config import (
3333
DatabaseBackendConfig,
3434
FileBackendConfig,
35+
LangfuseBackendConfig,
3536
StorageBackendConfig,
3637
)
3738
from lightspeed_evaluation.core.storage.factory import (
@@ -56,6 +57,7 @@
5657
"StorageError",
5758
"FileBackendConfig",
5859
"DatabaseBackendConfig",
60+
"LangfuseBackendConfig",
5961
"StorageBackendConfig",
6062
"CompositeStorageBackend",
6163
"NoOpStorageBackend",

src/lightspeed_evaluation/core/storage/config.py

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
"""Configuration models for storage backends.
22
3-
Defines Pydantic models for file and database storage configuration.
3+
Defines Pydantic models for file, database, and Langfuse storage configuration.
44
"""
55

66
from typing import Annotated, Literal, Optional
@@ -126,8 +126,39 @@ def validate_connection_fields(self) -> "DatabaseBackendConfig":
126126
return self
127127

128128

129+
class LangfuseBackendConfig(BaseModel):
130+
"""Configuration for Langfuse observability storage backend.
131+
132+
Exports evaluation scores to Langfuse as a trace with per-metric scores.
133+
Requires the ``langfuse`` optional extra: ``pip install 'lightspeed-evaluation[langfuse]'``
134+
135+
Credentials are resolved from config fields first, then ``LANGFUSE_PUBLIC_KEY``,
136+
``LANGFUSE_SECRET_KEY``, and ``LANGFUSE_HOST`` environment variables as fallback.
137+
138+
Example:
139+
- type: "langfuse"
140+
host: "https://cloud.langfuse.com"
141+
"""
142+
143+
model_config = ConfigDict(extra="forbid")
144+
145+
type: Literal["langfuse"] = "langfuse"
146+
host: Optional[str] = Field(
147+
default=None,
148+
description="Langfuse API host URL (falls back to LANGFUSE_HOST env var)",
149+
)
150+
public_key: Optional[str] = Field(
151+
default=None,
152+
description="Langfuse public key (falls back to LANGFUSE_PUBLIC_KEY env var)",
153+
)
154+
secret_key: Optional[str] = Field(
155+
default=None,
156+
description="Langfuse secret key (falls back to LANGFUSE_SECRET_KEY env var)",
157+
)
158+
159+
129160
# Discriminated union for polymorphic storage configuration
130161
StorageBackendConfig = Annotated[
131-
FileBackendConfig | DatabaseBackendConfig,
162+
FileBackendConfig | DatabaseBackendConfig | LangfuseBackendConfig,
132163
Field(discriminator="type"),
133164
]

src/lightspeed_evaluation/core/storage/factory.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,11 @@
1414
from lightspeed_evaluation.core.storage.config import (
1515
DatabaseBackendConfig,
1616
FileBackendConfig,
17+
LangfuseBackendConfig,
1718
StorageBackendConfig,
1819
)
1920
from lightspeed_evaluation.core.storage.file_storage import FileStorageBackend
21+
from lightspeed_evaluation.core.storage.langfuse_storage import LangfuseStorageBackend
2022
from lightspeed_evaluation.core.storage.protocol import BaseStorageBackend
2123
from lightspeed_evaluation.core.storage.sql_storage import SQLStorageBackend
2224
from lightspeed_evaluation.core.system.exceptions import ConfigurationError
@@ -127,6 +129,9 @@ def create_pipeline_storage_backend(
127129
"File storage entries in ``storage`` require ``system_config`` "
128130
"when building the pipeline storage backend."
129131
)
132+
elif isinstance(config, LangfuseBackendConfig):
133+
logger.info("Pipeline storage: langfuse backend")
134+
backends.append(LangfuseStorageBackend(config))
130135
else:
131136
raise ConfigurationError(
132137
f"Unknown storage backend type: {type(config).__name__!r}"

0 commit comments

Comments
 (0)