Summary
Define the observable contract for latency, cost, correctness, and degraded-mode behavior.
This issue was generated from an org-wide EvalOps mining pass on 2026-05-10 07:57 UTC. It combines live GitHub repo signals with a per-repo arXiv search. Treat the research links as grounding for a concrete implementation, not as a request for a literature review.
Repo Evidence
- Repository description: OpenAPI 3.x to MCP server bridge in TypeScript with stdio, StreamableHTTP, and SSE transports
- Tree signals: 0 docs files, 2 workflows, 0 proto files, 14 test-like files.
- Anchor files sampled:
README.md, package.json
Research Grounding
Repo axes: tooling, security, evaluation, governance
Search keywords: run, assert, const, npm, test, await, openapi, import, match, json, node, spec
- arXiv:2602.01129v1 SMCP: Secure Model Context Protocol (Xinyi Hou, Shenao Wang, Yifan Zhang, Ziluo Xue, Yanjie Zhao, Cai Fu), 2026.
- arXiv:2508.07575v1 MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark (Shiqing Fan, Xichen Ding, Liang Zhang, Linjian Mo), 2025.
- arXiv:2407.00121v1 Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks (Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal, Sadhana Kumaravel, Matthew Stallone, Rameswar Panda), 2024.
- arXiv:2410.17950v1 Benchmarking Floworks against OpenAI & Anthropic: A Novel Framework for Enhanced LLM Function Calling (Nirav Bhan, Shival Gupta, Sai Manaswini, Ritik Baba, Narun Yadav, Hillori Desai), 2024.
- arXiv:2602.18764v2 The Convergence of Schema-Guided Dialogue Systems and the Model Context Protocol (Andreas Schlapbach), 2026.
- arXiv:2507.19570v1 MCP4EDA: LLM-Powered Model Context Protocol RTL-to-GDSII Automation with Backend Aware Synthesis Optimization (Yiting Wang, Wanghao Ye, Yexiao He, Yiran Chen, Gang Qu, Ang Li), 2025.
- arXiv:2601.22129v2 SWE-Replay: Efficient Test-Time Scaling for Software Engineering Agents (Yifeng Ding, Lingming Zhang), 2026.
- arXiv:2504.00914v1 On the Robustness of Agentic Function Calling (Ella Rabinovich, Ateret Anaby-Tavor), 2025.
- arXiv:2509.20415v2 Online-Optimized RAG for Tool Use and Function Calling (Yu Pan, Xiaocheng Li, Hanzhao Wang), 2025.
- arXiv:2510.25694v1 Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents (Jiayi Kuang, Yinghui Li, Xin Zhang, Yangning Li, Di Yin, Xing Sun), 2025.
What To Build
- Name the key service/user journey SLOs and their required dimensions.
- Emit metrics/log fields for success, failure, cost/latency, and reasoned fallback.
- Add a dashboard/runbook stub or CLI report that makes the new signals operator-visible.
Acceptance Criteria
Notes
- Generated issue 4/5 for
evalops/mcp-openapi by evalops_org_miner.py.
- Before implementation, confirm the sampled latent-spec snippets still match
main; this issue intentionally cites exact file paths/lines where the mining pass saw them.
Summary
Define the observable contract for latency, cost, correctness, and degraded-mode behavior.
This issue was generated from an org-wide EvalOps mining pass on 2026-05-10 07:57 UTC. It combines live GitHub repo signals with a per-repo arXiv search. Treat the research links as grounding for a concrete implementation, not as a request for a literature review.
Repo Evidence
README.md,package.jsonResearch Grounding
Repo axes: tooling, security, evaluation, governance
Search keywords: run, assert, const, npm, test, await, openapi, import, match, json, node, spec
What To Build
Acceptance Criteria
Notes
evalops/mcp-openapibyevalops_org_miner.py.main; this issue intentionally cites exact file paths/lines where the mining pass saw them.