Skip to content

Commit 68bc75d

Browse files
authored
Merge pull request lightspeed-core#177 from narmaku/RSPEED-2485/programmatic-api
feat: add programmatic API for library integration
2 parents cb9c124 + 4bb159c commit 68bc75d

8 files changed

Lines changed: 606 additions & 71 deletions

File tree

docs/EVALUATION_GUIDE.md

Lines changed: 133 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -21,16 +21,17 @@
2121
7. [Step-by-Step Setup](#7-step-by-step-setup)
2222
8. [Configuration Guide](#8-configuration-guide)
2323
9. [Running Evaluations](#9-running-evaluations)
24-
10. [Understanding Results](#10-understanding-results)
24+
10. [Programmatic API](#10-programmatic-api)
25+
11. [Understanding Results](#11-understanding-results)
2526

2627
### Part 4: Real-World Application
27-
11. [Common Use Cases](#11-common-use-cases)
28-
12. [Best Practices](#12-best-practices)
29-
13. [Troubleshooting](#13-troubleshooting)
28+
12. [Common Use Cases](#12-common-use-cases)
29+
13. [Best Practices](#13-best-practices)
30+
14. [Troubleshooting](#14-troubleshooting)
3031

3132
### Part 5: Reference Materials
32-
14. [Quick Reference Tables](#14-quick-reference-tables)
33-
15. [Resources & Links](#15-resources--links)
33+
15. [Quick Reference Tables](#15-quick-reference-tables)
34+
16. [Resources & Links](#16-resources--links)
3435

3536
---
3637

@@ -856,7 +857,127 @@ lightspeed-eval \
856857

857858
---
858859

859-
## 10. Understanding Results
860+
## 10. Programmatic API
861+
862+
In addition to the CLI, the framework can be used as a Python library. This is useful when you want to integrate evaluations into scripts, notebooks, CI pipelines, or custom tooling—without dealing with YAML files or command-line arguments.
863+
864+
### Available Functions
865+
866+
| Function | Purpose |
867+
|----------|---------|
868+
| `evaluate(config, data)` | Evaluate a list of conversations |
869+
| `evaluate_conversation(config, data)` | Evaluate a single conversation |
870+
| `evaluate_turn(config, turn)` | Evaluate a single turn |
871+
872+
All three functions return `list[EvaluationResult]`.
873+
874+
### Basic Example
875+
876+
```python
877+
from lightspeed_evaluation import (
878+
evaluate,
879+
EvaluationData,
880+
LLMConfig,
881+
SystemConfig,
882+
TurnData,
883+
)
884+
885+
# 1. Build configuration
886+
config = SystemConfig(
887+
llm=LLMConfig(provider="openai", model="gpt-4o-mini"),
888+
)
889+
890+
# 2. Build evaluation data
891+
data = EvaluationData(
892+
conversation_group_id="my_eval",
893+
turns=[
894+
TurnData(
895+
turn_id="t1",
896+
query="What is OpenShift?",
897+
response="OpenShift is a Kubernetes-based container platform.",
898+
expected_response="OpenShift is Red Hat's Kubernetes platform.",
899+
turn_metrics=["ragas:response_relevancy"],
900+
),
901+
],
902+
)
903+
904+
# 3. Run evaluation
905+
results = evaluate(config, [data])
906+
907+
# 4. Inspect results
908+
for r in results:
909+
print(f"{r.metric_identifier}: {r.result} (score={r.score})")
910+
```
911+
912+
### Evaluating a Single Turn
913+
914+
Use `evaluate_turn()` when you want to evaluate one question-answer pair. You can override metrics without modifying the original turn object:
915+
916+
```python
917+
from lightspeed_evaluation import evaluate_turn, SystemConfig, TurnData
918+
919+
config = SystemConfig()
920+
turn = TurnData(
921+
turn_id="t1",
922+
query="What is a pod?",
923+
response="A pod is the smallest deployable unit in Kubernetes.",
924+
)
925+
926+
results = evaluate_turn(
927+
config,
928+
turn,
929+
metrics=["ragas:response_relevancy", "ragas:faithfulness"],
930+
)
931+
```
932+
933+
### Evaluating a Single Conversation
934+
935+
Use `evaluate_conversation()` when you have a single `EvaluationData` object:
936+
937+
```python
938+
from lightspeed_evaluation import evaluate_conversation, EvaluationData, SystemConfig, TurnData
939+
940+
config = SystemConfig()
941+
data = EvaluationData(
942+
conversation_group_id="support_conv",
943+
turns=[
944+
TurnData(turn_id="t1", query="Hello", response="Hi! How can I help?"),
945+
TurnData(turn_id="t2", query="What is OCP?", response="OCP is OpenShift."),
946+
],
947+
conversation_metrics=["deepeval:knowledge_retention"],
948+
)
949+
950+
results = evaluate_conversation(config, data)
951+
```
952+
953+
### Working with Results
954+
955+
The `evaluate()` functions return `list[EvaluationResult]`. Each result contains:
956+
957+
| Field | Description |
958+
|-------|-------------|
959+
| `result` | Status: `PASS`, `FAIL`, `ERROR`, or `SKIPPED` |
960+
| `score` | Numeric score between 0.0 and 1.0 |
961+
| `threshold` | Pass/fail threshold used |
962+
| `reason` | Explanation from the judge LLM |
963+
| `metric_identifier` | Which metric produced this result |
964+
| `turn_id` | Turn ID (for turn-level metrics) |
965+
| `conversation_group_id` | Conversation group ID |
966+
967+
No files are generated by default—file output is the caller's responsibility. If you need CSV/JSON reports, use the `OutputHandler` separately.
968+
969+
### CLI vs Programmatic API
970+
971+
| Aspect | CLI (`lightspeed-eval`) | Programmatic API |
972+
|--------|------------------------|------------------|
973+
| Configuration | YAML files | Python objects (`SystemConfig`) |
974+
| Input data | YAML files | Python objects (`EvaluationData`) |
975+
| Output | CSV, JSON, TXT files + graphs | `list[EvaluationResult]` in memory |
976+
| Use case | Standalone runs, CI jobs | Library integration, notebooks, scripts |
977+
978+
---
979+
980+
## 11. Understanding Results
860981

861982
### Output Files
862983

@@ -956,7 +1077,7 @@ ragas:faithfulness:
9561077
9571078
# Part 4: Real-World Application
9581079
959-
## 11. Common Use Cases
1080+
## 12. Common Use Cases
9601081
9611082
### Use Case 1: Quality Assurance for Customer Support Bot
9621083
@@ -1132,7 +1253,7 @@ exit $?
11321253

11331254
---
11341255

1135-
## 12. Best Practices
1256+
## 13. Best Practices
11361257

11371258
### 1. Start Small, Scale Up
11381259

@@ -1257,7 +1378,7 @@ llm:
12571378
12581379
---
12591380
1260-
## 13. Troubleshooting
1381+
## 14. Troubleshooting
12611382
12621383
### Issue 1: "No API key found"
12631384
@@ -1468,7 +1589,7 @@ lightspeed-eval --eval-data config/eval_batch2.yaml
14681589

14691590
# Part 5: Reference Materials
14701591

1471-
## 14. Quick Reference Tables
1592+
## 15. Quick Reference Tables
14721593

14731594
### All Metrics at a Glance
14741595

@@ -1564,7 +1685,7 @@ uv run python script/run_multi_provider_eval.py \
15641685
---
15651686

15661687

1567-
## 15. Resources & Links
1688+
## 16. Resources & Links
15681689

15691690
### Official Framework Documentation
15701691

src/lightspeed_evaluation/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
if TYPE_CHECKING:
1414
# ruff: noqa: F401
15+
from lightspeed_evaluation.api import evaluate, evaluate_conversation, evaluate_turn
1516
from lightspeed_evaluation.core.api import APIClient
1617
from lightspeed_evaluation.core.llm import LLMManager
1718
from lightspeed_evaluation.core.models import (
@@ -42,6 +43,10 @@
4243
__version__ = "0.5.0"
4344

4445
_LAZY_IMPORTS = {
46+
# Programmatic API
47+
"evaluate": ("lightspeed_evaluation.api", "evaluate"),
48+
"evaluate_conversation": ("lightspeed_evaluation.api", "evaluate_conversation"),
49+
"evaluate_turn": ("lightspeed_evaluation.api", "evaluate_turn"),
4550
# Main pipeline
4651
"EvaluationPipeline": (
4752
"lightspeed_evaluation.pipeline.evaluation",

src/lightspeed_evaluation/api.py

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
"""Programmatic API for the LightSpeed Evaluation Framework.
2+
3+
Provides clean public functions for using the framework as a Python library,
4+
without requiring YAML files or CLI argument parsing.
5+
6+
Example usage::
7+
8+
from lightspeed_evaluation import evaluate, SystemConfig, EvaluationData, TurnData
9+
10+
config = SystemConfig(llm=LLMConfig(provider="openai", model="gpt-4o-mini"))
11+
data = EvaluationData(
12+
conversation_group_id="my_eval",
13+
turns=[TurnData(turn_id="t1", query="What is OCP?", response="...")],
14+
)
15+
results = evaluate(config, [data])
16+
"""
17+
18+
from typing import Optional
19+
20+
from lightspeed_evaluation.core.models import (
21+
EvaluationData,
22+
EvaluationResult,
23+
SystemConfig,
24+
TurnData,
25+
)
26+
from lightspeed_evaluation.core.system import ConfigLoader
27+
from lightspeed_evaluation.pipeline.evaluation import EvaluationPipeline
28+
29+
30+
def evaluate(
31+
config: SystemConfig,
32+
data: list[EvaluationData],
33+
output_dir: Optional[str] = None,
34+
) -> list[EvaluationResult]:
35+
"""Run evaluation on the provided data using the given configuration.
36+
37+
Creates a fully-initialized pipeline from the ``SystemConfig``, runs
38+
evaluation on every conversation in *data*, and returns the raw results.
39+
No reports are generated — file I/O is the caller's responsibility.
40+
41+
Args:
42+
config: A pre-built SystemConfig instance.
43+
data: List of EvaluationData conversations to evaluate.
44+
output_dir: Optional override for the output directory.
45+
46+
Returns:
47+
List of EvaluationResult objects (one per metric per turn/conversation).
48+
"""
49+
if not data:
50+
return []
51+
52+
loader = ConfigLoader.from_config(config)
53+
pipeline = EvaluationPipeline(loader, output_dir)
54+
try:
55+
return pipeline.run_evaluation(data)
56+
finally:
57+
pipeline.close()
58+
59+
60+
def evaluate_conversation(
61+
config: SystemConfig,
62+
data: EvaluationData,
63+
output_dir: Optional[str] = None,
64+
) -> list[EvaluationResult]:
65+
"""Evaluate a single conversation group.
66+
67+
Convenience wrapper around :func:`evaluate` that wraps *data* in a list.
68+
69+
Args:
70+
config: A pre-built SystemConfig instance.
71+
data: A single EvaluationData conversation to evaluate.
72+
output_dir: Optional override for the output directory.
73+
74+
Returns:
75+
List of EvaluationResult objects.
76+
"""
77+
return evaluate(config, [data], output_dir=output_dir)
78+
79+
80+
def evaluate_turn(
81+
config: SystemConfig,
82+
turn: TurnData,
83+
metrics: Optional[list[str]] = None,
84+
conversation_group_id: str = "programmatic_eval",
85+
output_dir: Optional[str] = None,
86+
) -> list[EvaluationResult]:
87+
"""Evaluate a single turn.
88+
89+
Wraps the turn in an :class:`EvaluationData` instance and delegates to
90+
:func:`evaluate`. If *metrics* is provided, a copy of the turn is created
91+
with updated ``turn_metrics``.
92+
93+
Args:
94+
config: A pre-built SystemConfig instance.
95+
turn: The TurnData to evaluate.
96+
metrics: Optional list of metric identifiers to override turn_metrics.
97+
conversation_group_id: Conversation group ID for the wrapper.
98+
output_dir: Optional override for the output directory.
99+
100+
Returns:
101+
List of EvaluationResult objects.
102+
"""
103+
if metrics is not None:
104+
turn = TurnData.model_validate({**turn.model_dump(), "turn_metrics": metrics})
105+
106+
data = EvaluationData(
107+
conversation_group_id=conversation_group_id,
108+
turns=[turn],
109+
)
110+
return evaluate(config, [data], output_dir=output_dir)

src/lightspeed_evaluation/core/system/loader.py

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,52 @@ def __init__(self) -> None:
7676
self.evaluation_data: Optional[list[EvaluationData]] = None
7777
self.logger: Optional[logging.Logger] = None
7878

79+
@classmethod
80+
def from_config(cls, system_config: SystemConfig) -> "ConfigLoader":
81+
"""Create a fully-initialized ConfigLoader from an existing SystemConfig.
82+
83+
This allows programmatic use of the evaluation pipeline without
84+
loading configuration from a YAML file.
85+
86+
Args:
87+
system_config: A pre-built SystemConfig instance.
88+
89+
Returns:
90+
A fully-initialized ConfigLoader ready for pipeline use.
91+
"""
92+
loader = cls()
93+
loader.system_config = system_config
94+
95+
config_data = cls._build_config_data_from_system_config(system_config)
96+
setup_environment_variables(config_data)
97+
loader.logger = setup_logging(system_config.logging)
98+
99+
populate_metric_mappings(system_config)
100+
101+
return loader
102+
103+
@staticmethod
104+
def _build_config_data_from_system_config(
105+
system_config: SystemConfig,
106+
) -> dict[str, Any]:
107+
"""Build the minimal config dict needed by setup_environment_variables.
108+
109+
Extracts SSL-related fields so that ``create_ssl_certifi_bundle``
110+
can discover custom certificate paths.
111+
112+
Args:
113+
system_config: The SystemConfig to extract SSL fields from.
114+
115+
Returns:
116+
A dict suitable for ``setup_environment_variables``.
117+
"""
118+
return {
119+
"llm": {
120+
"ssl_verify": system_config.llm.ssl_verify,
121+
"ssl_cert_file": system_config.llm.ssl_cert_file,
122+
},
123+
}
124+
79125
def load_system_config(self, config_path: str) -> SystemConfig:
80126
"""Load system configuration from YAML file."""
81127
try:

0 commit comments

Comments
 (0)