You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/BLOG_POST.md
+9Lines changed: 9 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -155,6 +155,13 @@ I built an information retrieval evaluation pipeline alongside the task scoring
155
155
156
156
The refreshed retrieval pipeline run confirms moderate retrieval quality overall (file recall 0.460, MRR 0.364), but a large fraction of traces still lack mapped ground truth files (488/799), which limits configuration-level retrieval comparisons.
157
157
158
+
On the computable subset, aggregated baseline vs MCP retrieval metrics are:
159
+
160
+
| Config Type | n | File Recall | MRR | MAP | Context Efficiency |
But better retrieval doesn't always mean better outcomes. Still investigating this but likely finding the right files is necessary but not sufficient. The agent still has to correctly apply what it finds, and in some tasks the local code modification step is where removing local code availability from the MCP run environment hurts more than others.
159
166
160
167
## Patterns in the Retrieval-Outcome Pairing Data
@@ -175,6 +182,8 @@ Let's take a break from whatever voodoo variables control reward outcomes and ta
175
182
176
183
This updated snapshot indicates MCP token/tool usage overhead is currently dominating cost in the analysis set.
177
184
185
+
Suite-level cost is mixed: MCP is cheaper on several Org suites (for example crossorg −$0.062/task and incident −$0.048/task) but more expensive on some SDLC suites (refactor +$0.398/task, feature +$0.211/task). The full per-suite cost table is in the technical report.
0 commit comments