| title | DeepEval |
|---|---|
| id | integrations-deepeval |
| description | DeepEval integration for Haystack |
| slug | /integrations-deepeval |
A component that uses the DeepEval framework
to evaluate inputs against a specific metric. Supported metrics are defined by DeepEvalMetric.
Usage example:
from haystack_integrations.components.evaluators.deepeval import DeepEvalEvaluator, DeepEvalMetric
evaluator = DeepEvalEvaluator(
metric=DeepEvalMetric.FAITHFULNESS,
metric_params={"model": "gpt-4"},
)
output = evaluator.run(
questions=["Which is the most popular global sport?"],
contexts=[
[
"Football is undoubtedly the world's most popular sport with"
"major events like the FIFA World Cup and sports personalities"
"like Ronaldo and Messi, drawing a followership of more than 4"
"billion people."
]
],
responses=["Football is the most popular sport with around 4 billion" "followers worldwide"],
)
print(output["results"])def __init__(metric: str | DeepEvalMetric,
metric_params: dict[str, Any] | None = None)Construct a new DeepEval evaluator.
Arguments:
metric: The metric to use for evaluation.metric_params: Parameters to pass to the metric's constructor. Refer to theRagasMetricclass for more details on required parameters.
@component.output_types(results=list[list[dict[str, Any]]])
def run(**inputs: Any) -> dict[str, Any]Run the DeepEval evaluator on the provided inputs.
Arguments:
inputs: The inputs to evaluate. These are determined by the metric being calculated. SeeDeepEvalMetricfor more information.
Returns:
A dictionary with a single results entry that contains
a nested list of metric results. Each input can have one or more
results, depending on the metric. Each result is a dictionary
containing the following keys and values:
name- The name of the metric.score- The score of the metric.explanation- An optional explanation of the score.
def to_dict() -> dict[str, Any]Serializes the component to a dictionary.
Raises:
DeserializationError: If the component cannot be serialized.
Returns:
Dictionary with serialized data.
@classmethod
def from_dict(cls, data: dict[str, Any]) -> "DeepEvalEvaluator"Deserializes the component from a dictionary.
Arguments:
data: Dictionary to deserialize from.
Returns:
Deserialized component.
Metrics supported by DeepEval.
All metrics require a model parameter, which specifies
the model to use for evaluation. Refer to the DeepEval
documentation for information on the supported models.
Answer relevancy.
Inputs - questions: List[str], contexts: List[List[str]], responses: List[str]
Faithfulness.
Inputs - questions: List[str], contexts: List[List[str]], responses: List[str]
Contextual precision.
Inputs - questions: List[str], contexts: List[List[str]], responses: List[str], ground_truths: List[str]
The ground truth is the expected response.
Contextual recall.
Inputs - questions: List[str], contexts: List[List[str]], responses: List[str], ground_truths: List[str]
The ground truth is the expected response.\
Contextual relevance.
Inputs - questions: List[str], contexts: List[List[str]], responses: List[str]
@classmethod
def from_str(cls, string: str) -> "DeepEvalMetric"Create a metric type from a string.
Arguments:
string: The string to convert.
Returns:
The metric.