ORAssistant Automated Evaluation

This project automates the evaluation of language model responses using classification-based metrics and LLMScore. It supports testing against various models, including OpenAI and Google Vertex AI. It also serves as an evaluation benchmark for comparing multiple versions of ORAssistant.

Features

Classification-based Metrics:
- Categorizes responses into True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN).
- Computes metrics such as Accuracy, Precision, Recall, and F1 Score.
LLMScore:
- Assigns a score between 0 and 1 by comparing the ground truth against the generated response's quality and accuracy.

Setup

Environment Variables

Create a .env file in the root directory with the following variables:

GOOGLE_API_KEY=your_google_api_key
OPENAI_API_KEY=your_openai_api_key

Data Files

Input File: data/data.csv
- This file should contain the questions to be tested. Ensure it is formatted as a CSV file with the following columns: Question, Answer.
Output File: data/data_result.csv
- This file will be generated after running the script. It contains the results of the evaluation.

How to Run

Activate virtual environment

From the previous directory (evaluation), make sure you have run the command make init before activating virtual environment. It is needed to recognise this folder as a submodule.
Run the Script

Use the following command to execute the script with customizable options:
```
python main.py --env-path /path/to/.env --iterations 10 --llms "base-gemini-1.5-flash,base-gpt-4o" --agent-retrievers "v1=http://url1.com,v2=http://url2.com"
```
- --env-path: Path to the .env file.
- --iterations: Number of iterations per question.
- --llms: Comma-separated list of LLMs to test.
- --agent-retrievers: Comma-separated list of agent-retriever names and URLs.
View Results

Results will be saved in a CSV file named after the input data file with _result appended.

Basic Usage

a. Default Usage

python main.py

Uses the default .env file in the project root.
Default data/data.csv as input.
5 iterations per question.
Tests all available LLMs.
No additional agent-retrievers.

b. Specify .env Path

python main.py --env-path /path/to/.env

c. Customize Iterations and Select Specific LLMs

python main.py --iterations 10 --llms "base-gpt-4o,base-gemini-1.5-flash"

d. Add Agent-Retrievers with Custom Names

python main.py --agent-retrievers "v1=http://url1.com,v2=http://url2.com"

e. Full Example with All Options

python main.py \
    --env-path /path/to/.env \
    --iterations 10 \
    --llms "base-gemini-1.5-flash,base-gpt-4o" \
    --agent-retrievers "v1=http://url1.com,v2=http://url2.com"

f. Display Help Message

To view all available command-line options:

python main.py --help

Run Analysis

After generating results, you can perform analysis using the provided analysis.py script. To run the analysis, execute the following command:

streamlit run analysis.py

Sample Comparison Commands

To compare three versions of ORAssistant, use:

python main.py --agent-retrievers "orassistant-v1=http://url1.com,orassistant-v2=http://url2.com,orassistant-v3=http://url3.com"

Note: Each URL is the endpoint of the ORAssistant backend.

To compare ORAssistant with base-gpt-4o, use:
```
python main.py --llms "base-gpt-4o" --agent-retrievers "orassistant=http://url.com"
```
Note: The URL is the endpoint of the ORAssistant backend.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ORAssistant Automated Evaluation

Features

Setup

Environment Variables

Data Files

How to Run

Basic Usage

a. Default Usage

b. Specify .env Path

c. Customize Iterations and Select Specific LLMs

d. Add Agent-Retrievers with Custom Names

e. Full Example with All Options

f. Display Help Message

Run Analysis

Sample Comparison Commands

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ORAssistant Automated Evaluation

Features

Setup

Environment Variables

Data Files

How to Run

Basic Usage

a. Default Usage

b. Specify .env Path

c. Customize Iterations and Select Specific LLMs

d. Add Agent-Retrievers with Custom Names

e. Full Example with All Options

f. Display Help Message

Run Analysis

Sample Comparison Commands