Skip to content

Commit db9f0f6

Browse files
committed
update readme
1 parent 8084020 commit db9f0f6

4 files changed

Lines changed: 6 additions & 4 deletions

File tree

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,10 @@ A unified evaluation toolkit and leaderboard for rigorously assessing the scient
77
<hr style="width:100%;margin:16px 0;border:0;border-top:0.1px solid #d0d7de;" />
88

99
<div align="center">
10-
11-
[![Website](https://img.shields.io/badge/Website-SciEval-b8dcff?style=for-the-badge&logo=google-chrome&logoColor=white)](https://opencompass.org.cn/Intern-Discovery-Eval)&#160;
10+
[![Tutorial](https://img.shields.io/badge/Tutorial-SciEval-b8dcff?style=for-the-badge&logo=google-chrome&logoColor=white)](https://scievalkit-docs.readthedocs.io/zh-cn/latest)&#160;
1211
[![Leaderboard](https://img.shields.io/badge/LEADERBOARD-Scieval-f6e58d?style=for-the-badge&logo=huggingface)](https://opencompass.org.cn/Intern-Discovery-Eval/rank)&#160;
1312
[![Report](https://img.shields.io/badge/REPORT-Technical-f4c2d7?style=for-the-badge)](https://arxiv.org/abs/2512.22334)&#160;
1413
[![GitHub](https://img.shields.io/badge/GitHub-Repository-c7b9e2?style=for-the-badge&logo=github&logoColor=white)](https://github.com/InternScience/SciEvalKit)
15-
1614
<img src="assets/icon/welcome.png" alt="welcome" height="24" style="vertical-align:middle;" />
1715
&nbsp;Welcome to the official repository of <strong>SciEval</strong>!
1816

@@ -33,6 +31,8 @@ Its design is shaped by following core ideas:
3331
- **Capability‑oriented & reproducible ▸** A unified toolkit for **dataset construction, prompt engineering, inference, and expert‑aligned scoring** ensures transparent and repeatable comparisons.
3432
- **Grounded in real scenarios ▸** Benchmarks use domain‑specific data and tasks so performance reflects **actual scientific practice**, not synthetic proxies.
3533

34+
For a detailed and systematic introduction to SciEvalKit, please refer to the [SciEvalKit Tutorial](https://scievalkit-docs.readthedocs.io/en/latest/Quickstart.html).
35+
3636

3737
## <img src="assets/icon/progress.png" alt="progress" height="28" style="vertical-align:middle;" />&nbsp;Progress in Scientific Intelligence
3838

@@ -42,6 +42,7 @@ Its design is shaped by following core ideas:
4242
<img src="assets/general_scientific_comparison.png" alt="SciEval capability radar" width="100%">
4343
</div>
4444

45+
4546
- **General benchmarks overestimate scientific competence.** Even the strongest frontier models (e.g., **Gemini 3 Pro**) score below **60** on **Scientific Text Capability** , despite scoring near *90* on widely used general‑purpose benchmarks.
4647
- **Multimodal capability is breaking the 60‑point barrier.** **Gemini 3 Pro** leads **Scientific Multimodal Capability** with **62.88**, reflecting strong performance in multimodal perception and reasoning.
4748
- **Open‑source systems are rapidly closing the gap.** *Qwen3‑VL‑235B‑A22B* and *Qwen3‑Max* now match or surpass several proprietary models in symbolic reasoning and code generation, signalling healthy community progress.
@@ -54,6 +55,7 @@ Its design is shaped by following core ideas:
5455
<img src="assets/radar.png" alt="SciEval capability radar" width="70%">
5556
</div>
5657

58+
5759
| Category | Highlights |
5860
| ----------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
5961
| **Seven Core Dimensions** | Scientific Knowledge Understanding, Scientific Code Generation, Scientific Symbolic Reasoning, Scientific Hypothesis Generation, Scientific Multimodal Perception, Scientific Multimodal Reasoning, Scientific Multimodal Understanding |
@@ -85,7 +87,7 @@ Its design is shaped by following core ideas:
8587

8688
## <img src="assets/icon/start.png" alt="start" height="28" style="vertical-align:middle;" />&nbsp;Quick Start
8789

88-
Get from clone to first scores in minutes&mdash;see our local [QuickStart](docs/en/Quickstart.md) / [快速开始](docs/zh-CN/Quickstart.md) guides, or consult the [VLMEvalKit tutorial](https://vlmevalkit.readthedocs.io/en/latest/Quickstart.html) for additional reference.
90+
Get from clone to first scores in minutes&mdash;see our local [QuickStart](docs/en/Quickstart.md) / [快速开始](docs/zh-CN/Quickstart.md) guides, or consult the [SciEvalKit Tutorial](https://scievalkit-docs.readthedocs.io/en/latest/Quickstart.html) for additional reference.
8991

9092
### 1 · Install
9193

-312 KB
Loading

assets/github.png

-1.71 MB
Binary file not shown.

assets/radar.png

-591 KB
Loading

0 commit comments

Comments
 (0)