Skip to content

Allowing other LLMs and custom prompts in evaluation (specifically, deepeval) #1872

Description

@sanjayc2

Is your feature request related to a problem? Please describe.
I cannot use a (small) local LLM or customized prompts for evaluation of the RAG pipeline output. Smaller LLMs (e.g., minicheck) have become as good as GPT for evaluation.

Describe the solution you'd like
I would like to use a small local LLM for evaluation of the RAG pipeline output. At this time, it seems that only GPT LLMs are allowed. Smaller LLMs (e.g., minicheck) have become as good as GPT for evaluation. These local LLMs are available via Ollama. Also, there does not seem to be a way to customize the prompts used in haystack-deepeval.

Describe alternatives you've considered
Use deepeval "offline", i.e. saved the question, contexts (chunks) and answer and use deepeval locally. This is not very convenient, since I would like to be able to fine tune the model.

Additional context
The ability to use deepeval to evaluate a model during fine tuning is very useful. It is also good to be able to customize the prompt, since it looks like CoT or other techniques can improve evaluation outputs.

Metadata

Metadata

Assignees

No one assigned
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions