Skip to content

Commit e454d4a

Browse files
committed
update figs
1 parent eca5d17 commit e454d4a

6 files changed

Lines changed: 6 additions & 58 deletions

File tree

README.md

Lines changed: 5 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,8 @@
1818
<a href="https://huggingface.co/datasets/opencompass/VerifierBench" target="_blank" style="margin: 2px;">
1919
<img alt="Hugging Face Dataset" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-ff9800?color=ff9800&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
2020
</a>
21-
<a href="https://creativecommons.org/licenses/by-sa/4.0/" style="margin: 2px;">
22-
<img alt="License" src="https://img.shields.io/badge/License-CC%20BY--SA%204.0-f5de53?color=f5de53&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
21+
<a href="https://www.apache.org/licenses/LICENSE-2.0" style="margin: 2px;">
22+
<img alt="License" src="https://img.shields.io/badge/License-Apache%202.0-blue.svg?color=blue&logo=apache&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
2323
</a>
2424
</div>
2525

@@ -30,7 +30,7 @@
3030

3131

3232
<p align="center">
33-
<img src="https://cdn-uploads.huggingface.co/production/uploads/614ffea450eec00bf3c23652/gezsWZn0CxCc423gW5UMO.png" alt="Test Set Results" width="600" height="400">
33+
<img src="assets/model_performance.png" alt="Test Set Results" width="600" height="400">
3434
</p>
3535

3636
## Get the model and dataset from 🤗
@@ -389,44 +389,8 @@ CompassVerifier performance (F1 score w/o COT) on our new released [VerifierBen
389389
<td style="text-align: right;">62.6</td>
390390
<td style="text-align: right;">67.1</td>
391391
</tr>
392-
393-
<tr>
394-
<td colspan="6" style="text-align: center;"><strong><em>CompassVerifier (Qwen3)</em></strong></td>
395-
</tr>
396-
<tr>
397-
<td style="text-align: left;">CompassVerifier-1.7B</td>
398-
<td style="text-align: right;">87.1</td>
399-
<td style="text-align: right;">89.4</td>
400-
<td style="text-align: right;">63.0</td>
401-
<td style="text-align: right;">80.2</td>
402-
<td style="text-align: right;">80.0</td>
403-
</tr>
404-
<tr>
405-
<td style="text-align: left;">CompassVerifier-8B</td>
406-
<td style="text-align: right;">86.7</td>
407-
<td style="text-align: right;">90.7</td>
408-
<td style="text-align: right;">75.7</td>
409-
<td style="text-align: right;">79.3</td>
410-
<td style="text-align: right;">83.1</td>
411-
</tr>
412-
<tr>
413-
<td style="text-align: left;">CompassVerifier-14B</td>
414-
<td style="text-align: right;">90.3</td>
415-
<td style="text-align: right;">91.4</td>
416-
<td style="text-align: right;">79.1</td>
417-
<td style="text-align: right;">82.9</td>
418-
<td style="text-align: right;">85.9</td>
419-
</tr>
420-
<tr>
421-
<td style="text-align: left;">CompassVerifier-32B</td>
422-
<td style="text-align: right;">89.6</td>
423-
<td style="text-align: right;">92.3</td>
424-
<td style="text-align: right;">79.8</td>
425-
<td style="text-align: right;">83.0</td>
426-
<td style="text-align: right;">86.2</td>
427-
</tr>
428392
<tr>
429-
<td colspan="6" style="text-align: center;"><strong><em>CompassVerifier (Qwen2.5)</em></strong></td>
393+
<td colspan="6" style="text-align: center;"><strong><em>CompassVerifier</em></strong></td>
430394
</tr>
431395
<tr>
432396
<td style="text-align: left;">CompassVerifier-3B</td>
@@ -534,24 +498,8 @@ We also test the performance of CompassVerifier on [VerifyBench](https://arxiv.o
534498
<td style="text-align: right;">-</td>
535499
</tr>
536500
<tr>
537-
<td colspan="5" style="text-align: center;"><strong><em>CompassVerifier (Qwen3)</em></strong></td>
538-
</tr>
539-
<tr>
540-
<td style="text-align: left;">CompassVerifier-1.7B</td>
541-
<td style="text-align: right;">80.1</td>
542-
<td style="text-align: right;">69.3</td>
543-
<td style="text-align: right;">72.9</td>
544-
<td style="text-align: right;">61.0</td>
545-
</tr>
546-
<tr>
547-
<td style="text-align: left;">CompassVerifier-8B</td>
548-
<td style="text-align: right;">84.5</td>
549-
<td style="text-align: right;">72.7</td>
550-
<td style="text-align: right;">79.2</td>
551-
<td style="text-align: right;">55.4</td>
552-
</tr>
553501
<tr>
554-
<td colspan="5" style="text-align: center;"><strong><em>CompassVerifier (Qwen2.5)</em></strong></td>
502+
<td colspan="5" style="text-align: center;"><strong><em>CompassVerifier</em></strong></td>
555503
</tr>
556504
<tr>
557505
<td style="text-align: left;">CompassVerifier-3B</td>

assets/model_compare.png

-218 KB
Binary file not shown.

assets/model_performance.png

1.34 MB
Loading

docs/assets/model_compare.png

-218 KB
Binary file not shown.

docs/assets/model_performance.png

1.34 MB
Loading

docs/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -201,7 +201,7 @@ <h2 class="title is-3">Introduction</h2>
201201
<!-- Model Comparison Figure -->
202202
<div class="columns is-centered">
203203
<div class="column is-four-fifths has-text-centered">
204-
<img src="assets/model_compare.png" alt="Model Comparison" width="80%"/>
204+
<img src="assets/model_performance.png" alt="Model Comparison" width="80%"/>
205205
<p class="mt-3">
206206
Performance comparison of CompassVerifier with other models across different domains on VerifierBench.
207207
</p>

0 commit comments

Comments
 (0)