Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -253,6 +253,7 @@ The `storage` list in `system.yaml` selects where results go. You can use **seve
|--------|---------|
| **File backend** | CSV, JSON, and TXT under `output_dir`; tune formats with `enabled_outputs` and columns with `csv_columns`. |
| **Database (optional)** | SQLite, PostgreSQL, or MySQL for querying and analysis; writes are **incremental** per conversation; storage errors are logged as warnings and **do not** abort the evaluation run. |
| **Langfuse (optional)** | Export evaluation scores to [Langfuse](https://langfuse.com) for observability and analytics. Creates one trace per run with per-metric numeric scores. Requires `pip install 'lightspeed-evaluation[langfuse]'`. |

For field tables, full YAML examples (file-only, file + SQLite, file + Postgres), CSV column reference, and notes on API token columns, see **[Storage](docs/configuration.md#storage)** in the configuration guide.

Expand Down
7 changes: 7 additions & 0 deletions config/system.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -295,6 +295,13 @@ storage:
# database: "./eval_results.db"
# table_name: "evaluation_results"

# Langfuse backend (optional) - export scores to Langfuse observability platform
# Requires: pip install 'lightspeed-evaluation[langfuse]'
# Credentials via env vars: LANGFUSE_PUBLIC_KEY, LANGFUSE_SECRET_KEY, LANGFUSE_HOST
# Or provide them inline below:
# - type: "langfuse"
# host: "https://cloud.langfuse.com"

# Visualization settings
visualization:
figsize: [12, 8] # Graph size (width, height)
Expand Down
47 changes: 47 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -400,6 +400,53 @@ Save results to a database for querying and analysis. Supports SQLite, PostgreSQ

> **Note:** Database storage is incremental - results are saved as each conversation completes. Storage failures are logged as warnings but don't stop the evaluation.

### Langfuse Backend (Optional)
Export evaluation scores to [Langfuse](https://langfuse.com) for observability, analytics, and score tracking. Creates one trace per evaluation run with one numeric score per metric result.

Requires the Langfuse SDK v4:
```bash
# Using pip
pip install 'lightspeed-evaluation[langfuse]'

# Using uv
uv sync --extra langfuse
```

| Setting (storage[type="langfuse"].) | Default | Description |
|-------------------------------------|---------|-------------|
| type | `"langfuse"` | Backend type (required) |
| host | `null` | Langfuse API host URL (falls back to `LANGFUSE_HOST` env var) |
| public_key | `null` | Langfuse public key (falls back to `LANGFUSE_PUBLIC_KEY` env var) |
| secret_key | `null` | Langfuse secret key (falls back to `LANGFUSE_SECRET_KEY` env var) |

> **Credentials:** Configure credentials via environment variables (`LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECRET_KEY`, `LANGFUSE_HOST`) or inline in the YAML config. Environment variables are the recommended approach — inline config fields take precedence when set.

> **Score handling:** Results with a numeric score (PASS/FAIL) are exported as `NUMERIC` scores. Results without a score (`score=None`, e.g. ERROR/SKIPPED) are skipped. All Langfuse errors are logged but never abort the evaluation.

### Example: Langfuse via Environment Variables
```yaml
storage:
- type: "file"
output_dir: "./eval_output"
- type: "langfuse"
```
```bash
export LANGFUSE_PUBLIC_KEY="pk-lf-..."
export LANGFUSE_SECRET_KEY="sk-lf-..."
export LANGFUSE_HOST="https://cloud.langfuse.com"
```

### Example: Langfuse with Inline Credentials
```yaml
storage:
- type: "file"
output_dir: "./eval_output"
- type: "langfuse"
host: "https://cloud.langfuse.com"
public_key: "pk-lf-..."
secret_key: "sk-lf-..."
```

### Output types

| Output type (in `enabled_outputs`) | Description |
Expand Down
9 changes: 9 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,15 @@ nlp-metrics = [
"rapidfuzz>=3.0.0,<=3.14.3", # Required for semantic_similarity_distance
]

# Langfuse observability - export evaluation scores to Langfuse
# Install with:
# pip install 'lightspeed-evaluation[langfuse]'
# or
# uv sync --extra langfuse
langfuse = [
"langfuse>=4.0.0,<5.0.0",
]

[dependency-groups]
dev = [
"bandit>=1.7.0,<=1.9.2",
Expand Down
2 changes: 2 additions & 0 deletions src/lightspeed_evaluation/core/storage/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
from lightspeed_evaluation.core.storage.config import (
DatabaseBackendConfig,
FileBackendConfig,
LangfuseBackendConfig,
StorageBackendConfig,
)
from lightspeed_evaluation.core.storage.factory import (
Expand All @@ -56,6 +57,7 @@
"StorageError",
"FileBackendConfig",
"DatabaseBackendConfig",
"LangfuseBackendConfig",
"StorageBackendConfig",
"CompositeStorageBackend",
"NoOpStorageBackend",
Expand Down
35 changes: 33 additions & 2 deletions src/lightspeed_evaluation/core/storage/config.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
"""Configuration models for storage backends.

Defines Pydantic models for file and database storage configuration.
Defines Pydantic models for file, database, and Langfuse storage configuration.
"""

from typing import Annotated, Literal, Optional
Expand Down Expand Up @@ -126,8 +126,39 @@ def validate_connection_fields(self) -> "DatabaseBackendConfig":
return self


class LangfuseBackendConfig(BaseModel):
"""Configuration for Langfuse observability storage backend.

Exports evaluation scores to Langfuse as a trace with per-metric scores.
Requires the ``langfuse`` optional extra: ``pip install 'lightspeed-evaluation[langfuse]'``

Credentials are resolved from config fields first, then ``LANGFUSE_PUBLIC_KEY``,
``LANGFUSE_SECRET_KEY``, and ``LANGFUSE_HOST`` environment variables as fallback.

Example:
- type: "langfuse"
host: "https://cloud.langfuse.com"
"""

model_config = ConfigDict(extra="forbid")

type: Literal["langfuse"] = "langfuse"
host: Optional[str] = Field(
default=None,
description="Langfuse API host URL (falls back to LANGFUSE_HOST env var)",
)
public_key: Optional[str] = Field(
default=None,
description="Langfuse public key (falls back to LANGFUSE_PUBLIC_KEY env var)",
)
secret_key: Optional[str] = Field(
default=None,
description="Langfuse secret key (falls back to LANGFUSE_SECRET_KEY env var)",
)


# Discriminated union for polymorphic storage configuration
StorageBackendConfig = Annotated[
FileBackendConfig | DatabaseBackendConfig,
FileBackendConfig | DatabaseBackendConfig | LangfuseBackendConfig,
Field(discriminator="type"),
]
5 changes: 5 additions & 0 deletions src/lightspeed_evaluation/core/storage/factory.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@
from lightspeed_evaluation.core.storage.config import (
DatabaseBackendConfig,
FileBackendConfig,
LangfuseBackendConfig,
StorageBackendConfig,
)
from lightspeed_evaluation.core.storage.file_storage import FileStorageBackend
from lightspeed_evaluation.core.storage.langfuse_storage import LangfuseStorageBackend
from lightspeed_evaluation.core.storage.protocol import BaseStorageBackend
from lightspeed_evaluation.core.storage.sql_storage import SQLStorageBackend
from lightspeed_evaluation.core.system.exceptions import ConfigurationError
Expand Down Expand Up @@ -127,6 +129,9 @@ def create_pipeline_storage_backend(
"File storage entries in ``storage`` require ``system_config`` "
"when building the pipeline storage backend."
)
elif isinstance(config, LangfuseBackendConfig):
logger.info("Pipeline storage: langfuse backend")
backends.append(LangfuseStorageBackend(config))
else:
raise ConfigurationError(
f"Unknown storage backend type: {type(config).__name__!r}"
Expand Down
Loading
Loading