Skip to content

Commit 11c76e8

Browse files
authored
Revise test results for new models
Updated the weighted mean scores for newly added tests in the results.
1 parent 1bdd9ea commit 11c76e8

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

experiments/kdd 2026/new_tests_results.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Assertion-weighted mean scores (0-100) on the **36 newly added tests** only: 17 Box, 5 Google Calendar, 7 Linear, and 6 Slack. All runs included API documentation.
44

55
| Model | Weighted Mean
6-
|---|---|---|
6+
|---|---|
77
| openai/gpt-5 | 88.10
88
| openai/gpt-5-mini | 87.61
99
| deepseek/deepseek-v3.2 | 84.26

0 commit comments

Comments
 (0)