Commit bd9b242
authored
fix(gatekeeper): pass a looser schema to the model (#462)
In evals, Claude Opus and Claude Sonnet were performing *much*
worse when run directly from Anthropic than when run from Vertex AI
(and worse than other flagship models)
Switching the schema passed to response_format so that the
reason is optional first greatly improves the Claude models on the
Anthropic pplatforms (Opus from 76.7% => 93.2%) and doesn't have much
noticeable effect on other models.
Theory here is:
- The response_format isn't ending up in the prompt for Vertex AI,
though it may constrain decoding.
- When the response_format does end up in the prompt, being required
to provide a reason "intimidates" the model, and responding OK
seems easier.1 parent 18509ca commit bd9b242
3 files changed
Lines changed: 3 additions & 11 deletions
File tree
- src/linux_mcp_server/gatekeeper
- tests/gatekeeper
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
4 | 3 | | |
5 | 4 | | |
6 | 5 | | |
7 | | - | |
| 6 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
108 | | - | |
109 | | - | |
110 | 108 | | |
111 | 109 | | |
112 | 110 | | |
| |||
186 | 184 | | |
187 | 185 | | |
188 | 186 | | |
189 | | - | |
190 | | - | |
191 | | - | |
192 | | - | |
193 | 187 | | |
194 | 188 | | |
195 | 189 | | |
| |||
215 | 209 | | |
216 | 210 | | |
217 | 211 | | |
218 | | - | |
| 212 | + | |
219 | 213 | | |
220 | 214 | | |
221 | 215 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
13 | 12 | | |
14 | 13 | | |
15 | 14 | | |
| |||
125 | 124 | | |
126 | 125 | | |
127 | 126 | | |
128 | | - | |
| 127 | + | |
129 | 128 | | |
130 | 129 | | |
131 | 130 | | |
| |||
0 commit comments