feat: publish evaluation report on the website by raifdmueller · Pull Request #344 · LLM-Coding/Semantic-Anchors

raifdmueller · 2026-03-24T21:07:07Z

Summary

Copy evaluations/report.html to website/public/evaluation-report.html during build
Add link on the evaluations concept page (#/evaluations)
Report is self-contained HTML (inline CSS, no external dependencies)

Shows: Claude Sonnet 99%, GPT-4o 97%, Mistral Large 96% across 193 questions.

Test plan

Build copies report to public dir
All 88 unit tests pass
Verify link works on evaluations page

🤖 Generated with Claude Code

Summary by CodeRabbit

Releasenotizen

Neue Funktionen
- Evaluierungsberichtsseite mit detaillierten Modellvergleichen, Leistungsmetriken und Fehleranalyse hinzugefügt.
Dokumentation
- Neue direkte Verknüpfung zu den neuesten Evaluierungsergebnissen in der Dokumentation bereitgestellt.

- Copy evaluations/report.html to website/public/ during build - Add "View the latest evaluation results" link on the evaluations page - Report accessible at /evaluation-report.html Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-03-24T21:07:24Z

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

Der Pull Request fügt einen Link zu einem Evaluierungsbericht in die Dokumentation ein, erweitert das Build-Skript um einen Kopiervorgang für eine HTML-Datei und stellt eine neue statische Evaluierungsberichte-Seite bereit.

Changes

Cohort / File(s)	Summary
Dokumentation `docs/anchor-evaluations.adoc`	Fügt einen Hyperlink zu einem externen Evaluierungsbericht (`../evaluation-report.html`) hinzu, um direkt auf die neuesten Evaluierungsergebnisse zu verweisen.
Build-Skript `scripts/render-docs.js`	Implementiert einen Prebuild-Schritt, um `evaluations/report.html` zu prüfen und bei Existenz nach `website/public/evaluation-report.html` zu kopieren.
Statischer Content `website/public/evaluation-report.html`	Neue statische HTML-Seite mit einem "Semantic Anchor Evaluation Report", einschließlich tabellarischer Zusammenfassung von Modellleistungen, Heatmap mit Ankern und Ergebnissen, sowie detaillierte Metadaten zur Testdurchführung.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~4 minutes

Possibly related PRs

docs: add semantic anchor evaluation concept #328: Beide PRs modifizieren die gleichen Evaluierungsdokumente und die Dokumentations-Rendering-Pipeline; die verwandte PR fügt die Basisdatei docs/anchor-evaluations.adoc hinzu, während dieser PR einen Link dort einfügt und die Kopierlogik für den Evaluierungsbericht implementiert.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	Der Pull-Request-Titel beschreibt klar und prägnant die Hauptänderung: die Veröffentlichung eines Evaluierungsberichts auf der Website.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

rdmueller merged commit b5e6196 into LLM-Coding:main Mar 24, 2026
5 of 7 checks passed

coderabbitai Bot mentioned this pull request Mar 26, 2026

feat: evaluate 6 models including Mistral Small/Medium/Devstral #353

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: publish evaluation report on the website#344

feat: publish evaluation report on the website#344
rdmueller merged 1 commit into
LLM-Coding:mainfrom
raifdmueller:feat/evaluation-report-page

raifdmueller commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading

Review failed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

raifdmueller commented Mar 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Releasenotizen

Uh oh!

coderabbitai Bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

raifdmueller commented Mar 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 24, 2026 •

edited

Loading