Name	Name	Last commit message	Last commit date
parent directory ..
Bootstrap.cs	Bootstrap.cs
Program.cs	Program.cs
README.md	README.md
RagEvaluation.csproj	RagEvaluation.csproj
golden-dataset.json	golden-dataset.json

Name

Last commit message

Last commit date

Chapter 3.2.7 -- RAG Evaluation Harness

Companion code for Generative AI in .NET, Chapter 3 section 3.2.7 ("Evaluation: Measuring RAG Quality").

A minimal LLM-as-judge harness that scores candidate answers on faithfulness (supported by ground truth) and relevance (answers the question). The judge returns structured JSON via GetResponseAsync<JudgeVerdict>(...), so scores are typed integers, not parsed strings.

The bundled golden-dataset.json is a 3-row demo. Replace it with your own questions/ground-truths and plug in your real RAG pipeline's answer in place of the candidateAnswer placeholder.

Bootstrap.cs ships a small Q&A-pair generator that reads chunks of your indexed corpus and proposes one QaPair per chunk via the same IChatClient plumbing. Use it to seed a golden dataset from a corpus with no existing user logs, then human-review the result before checking it in.

Run it

export OPENAI_API_KEY=sk-...
dotnet run --project samples/ch03-rag/03.2.7-evaluation

Manuscript reference

Manuscript/Chapter-03.md, section 3.2.7.
Figure 3.8 ("RAG Evaluation Pipeline") visualizes the same harness with a CI gate.

Prerequisites

.NET 9 SDK and an OpenAI API key (or any IChatClient provider).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

README.md

Chapter 3.2.7 -- RAG Evaluation Harness

Run it

Manuscript reference

Prerequisites

Uh oh!

FilesExpand file tree

03.2.7-evaluation

Directory actions

More options

Directory actions

More options

Latest commit

History

03.2.7-evaluation

Folders and files

parent directory

README.md

Chapter 3.2.7 -- RAG Evaluation Harness

Run it

Manuscript reference

Prerequisites