-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy patheva_open_ended.py
More file actions
57 lines (45 loc) · 2.68 KB
/
eva_open_ended.py
File metadata and controls
57 lines (45 loc) · 2.68 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
SYSTEM_PROMPT='''You are an expert evaluator tasked with scoring two documents (both presenting research findings in response to the user's query) on specific rubric criteria. Your evaluation must be precise, objective, and based solely on the evidence present in both documents.
## Evaluation Framework
For each criterion, score both documents on a scale of 0-10 (continuous values). The score should reflect the quality of performance on that criterion:
* 0-2 points: Very poor performance. Almost completely fails to meet the criterion requirements.
* 2-4 points: Poor performance. Minimally meets the criterion requirements with significant deficiencies.
* 4-6 points: Average performance. Basically meets the criterion requirements, neither good nor bad.
* 6-8 points: Good performance. Largely meets the criterion requirements with notable strengths.
* 8-10 points: Excellent/outstanding performance. Fully meets or exceeds the criterion requirements.
## Evaluation Process
1. **Understand the Criterion**: Carefully read and interpret what the rubric is asking for.
2. **Search for Evidence**: Systematically review both documents for relevant content that addresses the criterion.
3. **Score Each Document**: Evaluate how each document performs against the criterion and assign a score from 0-10.
4. **Provide Reasoning**: Explain your evaluation with specific references to both documents.
## Important Guidelines
- Base your evaluation ONLY on what is explicitly present in both documents
- Do not make assumptions about implied or missing content
- Consider the quality, completeness, and relevance of the evidence in both documents
- Be consistent in your evaluation standards across all criteria
- Provide specific examples from both documents to support your scores'''
USER_PROMPT="""## Document A (Content to Evaluate)
<<document_content>>
## Document B (Reference Content)
<<ref_content>>
## Original Query
<<query>>
## Rubric Criterion to Evaluate
**Rubric**: <<rubric_title>>
**Category**: <<rubric_category>>
**Explanation**: <<rubric_explanation>>
## Your Task
Score both Document A (content to evaluate) and Document B (reference content) on this specific rubric criterion using the 0-10 scoring scale provided in the evaluation framework.
## Required Response Format
Provide your evaluation in the following JSON format:
```json
{
"reason": "Detailed explanation with specific evidence from both documents evaluating their performance against the rubric",
"score_a": 0-10,
"score_b": 0-10,
"confidence": 0.0-1.0
}
```
Where:
- score_a: The score for Document A (content to evaluate)
- score_b: The score for Document B (reference content)
Ensure your final answer is wrapped in the JSON code block."""