Skip to content

Commit ff9e623

Browse files
authored
RHIDP-12086: Lightspeed Evaluation Framework documentation (redhat-developer#2161)
* Lightspeed Evaluation Framework documentation * Incorporate CQA comments * CQA changes 2 * CQA check * Minor fix * Incorporating Gerry's comments * JTBD updates * Incorporated Heena's comment
1 parent f984343 commit ff9e623

8 files changed

Lines changed: 243 additions & 0 deletions
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
ifdef::context[:parent-context: {context}]
3+
4+
[id="ai-model-evaluation-data-to-select-the-right-ai-model_{context}"]
5+
= AI model evaluation data to select the right AI model
6+
7+
:context: assembly-evaluate-developer-lightspeed-performance
8+
9+
[role="_abstract"]
10+
Use the {ls-brand-name} evaluation framework to validate the performance, accuracy, and reliability of {ls-short}.
11+
12+
With this automated toolset, you can measure how effectively various large language models (LLMs) answer questions based on {product} documentation.
13+
14+
.Components of the evaluation framework
15+
[cols="1,3",options="header"]
16+
|===
17+
| Component | Description
18+
| Evaluation framework | Contains the core logic and scripts used to run evaluations.
19+
| Datasets | Includes the input files used to test the model.
20+
| Evaluation metrics integration | Provides scoring through various metrics, including Ragas, DeepEval, and custom metrics. Ragas is the primary metric used to validate {ls-short} performance.
21+
|===
22+
23+
include::../modules/shared/proc-configure-the-evaluation-environment-to-validate-model-accuracy.adoc[leveloffset=+1]
24+
25+
include::../modules/shared/proc-prepare-evaluation-datasets-to-verify-ai-generated-responses.adoc[leveloffset=+1]
26+
27+
include::../modules/shared/proc-run-performance-tests-to-ensure-ai-response-reliability.adoc[leveloffset=+1]
28+
29+
include::../modules/shared/proc-analyze-evaluation-results-to-identify-performance-gaps.adoc[leveloffset=+1]
30+
31+
include::../modules/shared/ref-evaluation-metrics-and-historical-data-reference.adoc[leveloffset=+1]
32+
33+
include::../modules/shared/ref-release-report-and-historical-data.adoc[leveloffset=+1]
34+
35+
ifdef::parent-context[:context: {parent-context}]
36+
ifndef::parent-context[:!context:]
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
:_mod-docs-content-type: PROCEDURE
2+
3+
[id="analyze-evaluation-results-to-identify-performance-gaps_{context}"]
4+
= Analyze evaluation results to identify performance gaps
5+
6+
[role="_abstract"]
7+
Determine the performance of {ls-short} and identify documentation areas that require model improvement by analyzing evaluation results in the repository. You can use these reports to compare performance across different large language models (LLMs) and topics.
8+
9+
.Prerequisites
10+
* You must have access to the link:https://github.com/redhat-ai-dev/developer-lightspeed-evaluation/tree/main[`developer-lightspeed-evaluation` repository].
11+
12+
.Procedure
13+
14+
. In the root of the repository, navigate to the version-specific folder within the link:https://github.com/redhat-ai-dev/developer-lightspeed-evaluation/tree/main/evaluation-result[`/evaluation-result`] directory.
15+
. Open the following files to evaluate performance:
16+
17+
** Model Pass Rate: Compare the overall performance between different LLMs.
18+
** Topic Pass Rate: Identify performance trends and gaps within specific documentation areas.
19+
20+
.Verification
21+
22+
* Verify that the reports display data visualizations or metrics consistent with your recent evaluation run.
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
:_mod-docs-content-type: PROCEDURE
2+
3+
[id="configure-the-evaluation-environment-to-validate-model-accuracy_{context}"]
4+
= Configure the evaluation environment to validate model accuracy
5+
6+
[role="_abstract"]
7+
Set up the evaluation environment to validate the performance and accuracy of {ls-short}. Configure this evaluation to ensure the model correctly interprets documentation and provides dependable answers.
8+
9+
By performing these evaluations, you minimize the risk of the model delivering incorrect or hallucinated information to users in production.
10+
11+
.Prerequisites
12+
13+
* Install *uv* for Python package management (Python 3.11 or later).
14+
15+
.Procedure
16+
17+
. Clone the evaluation repository and navigate to the directory:
18+
+
19+
[source,bash]
20+
----
21+
git clone https://github.com/lightspeed-core/lightspeed-evaluation
22+
cd lightspeed-evaluation
23+
----
24+
25+
. Synchronize the environment and install dependencies:
26+
+
27+
[source,bash]
28+
----
29+
uv sync
30+
----
31+
32+
. Configure the environment variables for the judge LLM. You can create a `.env` file in the root directory or export the keys directly to your terminal.
33+
** If you use Gemini, you must set the Gemini API key:
34+
+
35+
[source,bash]
36+
----
37+
export GEMINI_API_KEY="your-google-api-key"
38+
----
39+
** If you use OpenAI, you must set the OpenAI API key:
40+
+
41+
[source,bash]
42+
----
43+
export OPENAI_API_KEY="your-key"
44+
----
45+
46+
. Optional: If you test with a live service, set your {ls-short} service API key:
47+
+
48+
[source,bash]
49+
----
50+
export API_KEY="your-lightspeed-service-key"
51+
----
52+
53+
.Verification
54+
55+
* Verify that the environment is synchronized and the virtual environment is active:
56+
+
57+
[source,bash]
58+
----
59+
uv run python --version
60+
----
61+
+
62+
The output must return Python 3.11 or later.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
:_mod-docs-content-type: PROCEDURE
2+
3+
[id="prepare-evaluation-datasets-to-verify-ai-generated-responses_{context}"]
4+
= Prepare evaluation datasets to verify AI-generated responses
5+
6+
[role="_abstract"]
7+
Prepare evaluation datasets to test the performance of {ls-short}. You can use pre-generated AI datasets for specific {product} releases or generate custom AI datasets from your own documentation.
8+
9+
.Prerequisites
10+
11+
* You must clone the evaluation repository to your local machine.
12+
13+
.Procedure
14+
15+
. Download pre-generated datasets: Use this method to test the performance of specific {product-very-short} releases. These datasets are generated using link:https://docs.ragas.io/en/stable/concepts/test_data_generation/rag/[Ragas testset generation for RAG].
16+
17+
.. In your terminal, navigate to the link:https://github.com/redhat-ai-dev/developer-lightspeed-evaluation/tree/main/dataset[/dataset folder] in the evaluation repository.
18+
.. Locate the `.evaluation_dataset_yaml` files. These files are pre-configured for the evaluation tool.
19+
.. To test a historical release, switch to the corresponding branch.
20+
+
21+
--
22+
For example, to access the {product} 1.8 dataset, switch to the `1.8` branch.
23+
24+
[IMPORTANT]
25+
====
26+
The `main` branch contains work-in-progress (WIP) datasets. Avoid using this branch for stable evaluations.
27+
====
28+
--
29+
30+
. Generate custom datasets: Use this method to create a new test set from your own technical documentation.
31+
32+
.. Generate a diverse set of question-and-answer (Q&A) pairs by following the link:https://docs.ragas.io/en/stable/concepts/test_data_generation/rag/[Ragas test data generation documentation].
33+
34+
.. Ensure your Q&A pairs match the required format by link:https://github.com/lightspeed-core/lightspeed-evaluation?tab=readme-ov-file[reviewing the evaluation data structure configuration].
35+
36+
.Verification
37+
38+
* Verify that your custom dataset matches the required schema before you start the evaluation run.
39+
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
:_mod-docs-content-type: PROCEDURE
2+
3+
[id="run-performance-tests-to-ensure-ai-response-reliability_{context}"]
4+
= Run performance tests to ensure AI response reliability
5+
6+
[role="_abstract"]
7+
Use the evaluation framework to run performance tests in either static mode to evaluate pre-recorded responses or dynamic mode to call a live service.
8+
9+
These evaluations identify performance gaps, allow you to compare different large language models (LLMs), and ensure that {ls-short} provides reliable information to users.
10+
11+
.Prerequisites
12+
13+
* You must link:https://github.com/lightspeed-core/lightspeed-evaluation#installation[install and configure the evaluation environment].
14+
* You must prepare an evaluation dataset.
15+
16+
.Procedure
17+
. Download the link:https://github.com/lightspeed-core/lightspeed-evaluation/blob/main/config/system.yaml[`system.yaml` configuration template] from the repository.
18+
. Configure the parameters in the `system.yaml` file based on your evaluation mode:
19+
+
20+
[cols="1,3",options="header"]
21+
|===
22+
| Field | Description
23+
| `llm` | Defines the judge LLM that scores the responses, such as `gemini-2.5-pro`.
24+
| `api.enabled` | Set to `false` for static mode to use pre-filled data. Set to `true` for dynamic mode to call a live service.
25+
| `api.api_base` | (Required for dynamic mode only) Provide the URL of your {ls-short} service.
26+
| `api.endpoint_type` | Specify the service configuration type: `streaming` or `query`.
27+
|===
28+
29+
. Execute the evaluation by using the `lightspeed-eval` command:
30+
+
31+
[source,bash]
32+
----
33+
lightspeed-eval \
34+
--system-config config/system.yaml \
35+
--eval-data config/evaluation_data.yaml \
36+
--output-dir ./my_evaluation_results
37+
----
38+
39+
.Verification
40+
41+
* Navigate to the specified output directory and verify that the generated reports contain the model performance scores.
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
:_mod-docs-content-type: REFERENCE
2+
3+
[id="evaluation-metrics-and-historical-data-reference_{context}"]
4+
= Evaluation metrics and historical data reference
5+
6+
[role="_abstract"]
7+
Use the link:https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/[available metrics] to evaluate the performance of {ls-short} at the conversation turn level.
8+
9+
These metrics provide a standardized way to measure the accuracy and reliability of the generated responses and the retrieved content.
10+
11+
[cols="1,3",options="header"]
12+
|===
13+
| Metric | Description
14+
| `Faithfulness` | Measures how well the answer is derived solely from the retrieved context.
15+
| `Context recall` | Measures whether the retrieved context contains all information required to answer the question.
16+
| `Context relevance` | Verifies if the retrieved documentation chunks are relevant to the user query.
17+
| `Context precision without reference` | Measures the ratio of useful information within the retrieved documentation chunks.
18+
| `Answer correctness` | Compares the generated response against the expected ground-truth response. This custom metric is implemented in the evaluation tool.
19+
|===
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
:_mod-docs-content-type: REFERENCE
2+
3+
[id="release-report-and-historical-data_{context}"]
4+
= Release report and historical data
5+
6+
[role="_abstract"]
7+
Use the link:https://github.com/redhat-ai-dev/developer-lightspeed-evaluation[latest Q&A dataset and evaluation results] to monitor the current performance of {ls-short}.
8+
9+
Access version-specific branches that contain the datasets and evaluation results required to track improvements or regressions across product releases.
10+
11+
[IMPORTANT]
12+
====
13+
The `main` branch contains work-in-progress data for versions currently under development. For stable evaluations or historical tracking, you must switch to the branch associated with a specific release.
14+
====
15+
16+
[cols="1,1,2",options="header"]
17+
|===
18+
| Release version | Branch name | Data included
19+
| Latest stable | Most recent version branch | The current question and answer (Q&A) dataset and evaluation results.
20+
| Historical | Previous version branches | Datasets and evaluation results for previous releases to track regressions.
21+
|===
22+

titles/integrate_interacting-with-developer-lightspeed-for-rhdh/master.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ include::assemblies/shared/assembly-customize.adoc[leveloffset=+1]
2828

2929
include::assemblies/shared/assembly-get-ai-assisted-help-for-your-development-tasks.adoc[leveloffset=+1]
3030

31+
include::assemblies/shared/assembly-ai-model-evaluation-data-to-select-the-right-ai-model.adoc[leveloffset=+1]
32+
3133
include::assemblies/shared/assembly-appendix-llm-requirements.adoc[leveloffset=+1]
3234

3335
include::assemblies/shared/assembly-appendix-about-user-data-security.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)