You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Welcome to the official repository of <strong>SciEval</strong>!
18
16
@@ -33,6 +31,8 @@ Its design is shaped by following core ideas:
33
31
-**Capability‑oriented & reproducible ▸** A unified toolkit for **dataset construction, prompt engineering, inference, and expert‑aligned scoring** ensures transparent and repeatable comparisons.
34
32
-**Grounded in real scenarios ▸** Benchmarks use domain‑specific data and tasks so performance reflects **actual scientific practice**, not synthetic proxies.
35
33
34
+
For a detailed and systematic introduction to SciEvalKit, please refer to the [SciEvalKit Tutorial](https://scievalkit-docs.readthedocs.io/en/latest/Quickstart.html).
35
+
36
36
37
37
## <imgsrc="assets/icon/progress.png"alt="progress"height="28"style="vertical-align:middle;" /> Progress in Scientific Intelligence
38
38
@@ -42,6 +42,7 @@ Its design is shaped by following core ideas:
-**General benchmarks overestimate scientific competence.** Even the strongest frontier models (e.g., **Gemini 3 Pro**) score below **60** on **Scientific Text Capability** , despite scoring near *90* on widely used general‑purpose benchmarks.
46
47
-**Multimodal capability is breaking the 60‑point barrier.****Gemini 3 Pro** leads **Scientific Multimodal Capability** with **62.88**, reflecting strong performance in multimodal perception and reasoning.
47
48
-**Open‑source systems are rapidly closing the gap.***Qwen3‑VL‑235B‑A22B* and *Qwen3‑Max* now match or surpass several proprietary models in symbolic reasoning and code generation, signalling healthy community progress.
@@ -54,6 +55,7 @@ Its design is shaped by following core ideas:
Get from clone to first scores in minutes—see our local [QuickStart](docs/en/Quickstart.md) / [快速开始](docs/zh-CN/Quickstart.md) guides, or consult the [VLMEvalKit tutorial](https://vlmevalkit.readthedocs.io/en/latest/Quickstart.html) for additional reference.
90
+
Get from clone to first scores in minutes—see our local [QuickStart](docs/en/Quickstart.md) / [快速开始](docs/zh-CN/Quickstart.md) guides, or consult the [SciEvalKit Tutorial](https://scievalkit-docs.readthedocs.io/en/latest/Quickstart.html) for additional reference.
0 commit comments