Skip to content

Commit b751233

Browse files
committed
updating the model evaluation metrics document
1 parent 770c403 commit b751233

1 file changed

Lines changed: 5 additions & 2 deletions

File tree

0_domain_study/model_evaluation_metrics.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
|-----------------|-------------|---------------|--------------------------|
55
|Reasoning / Logic|Mathematical reasoning|GSM8K|(correct answers / total)|
66
|Commonsense QA|Everyday reasoning and knowledge|PIQA, BoolQ|Accuracy|
7-
|Summarization|condense information|CNN/DailyMail, XSum|ROUGE-L, BERTScore|
7+
|Summarization|Condensing information|CNN/DailyMail, XSum|ROUGE-L, BERTScore|
88
|Code Generation|Logical structure|HumanEval-lite, MBPP|Pass@k|
99

1010
## Datasets
@@ -40,7 +40,10 @@ id: BBC ID of the article.
4040
It is an English-language dataset containing just over
4141
300k unique news articles as written by journalists at CNN and the Daily Mail.
4242
he current version supports both extractive and abstractive summarization.
43-
The HumanEval dataset released by OpenAI includes 164 programming problems
43+
44+
### The HumanEval Dataset
45+
46+
released by OpenAI includes 164 programming problems
4447
with a function sig- nature, docstring, body, and several unit tests.
4548
They were handwritten to ensure not to be included in the training
4649
set of code generation models.

0 commit comments

Comments
 (0)