Hi, thank you for putting together this database of SDKs, frameworks, and tools for AI agents.
I would like to suggest adding WFGY, which is an open source framework for debugging, evaluating, and stress-testing LLM and agent pipelines, with a strong focus on RAG and vector-database failure modes.
What WFGY is:
- A text-driven debugging framework, not a hosted SaaS. It runs on top of any GPT-4-class model and guides the model through a structured seven-step reasoning and diagnosis flow.
- A set of 16 named failure patterns for RAG and LLM apps (retrieval quality, vectorstore index issues, prompt injection, deployment gaps, etc.) with practical “fix playbooks”.
- A TXT evaluation pack (WFGY 3.0 – Singularity Demo) with 131 long-horizon stress-test problems that can be loaded into any agent environment to probe reasoning, robustness, and hallucination behaviour.
Why it may fit this list:
- The repo explicitly includes frameworks, libraries, and tools for creating, monitoring, debugging, and deploying agents. WFGY sits in the “debugging / evaluation / observability” space, similar in spirit to entries like LangSmith or Langfuse, but implemented as an open source text pack rather than a SaaS product.
- It is provider-agnostic and can be wrapped into existing SDKs or toolchains by simply loading the TXT pack and following the documented evaluation flow.
- The project is MIT-licensed and has been actively maintained and used in real debugging work, so people can integrate it or fork it for their own stacks.
If you think this is in scope for the list, I am happy to prepare a PR that adds a concise one-line entry in the most appropriate section (probably under debugging / evaluation tools for AI agents).
Thanks again for maintaining the repo.
Hi, thank you for putting together this database of SDKs, frameworks, and tools for AI agents.
I would like to suggest adding WFGY, which is an open source framework for debugging, evaluating, and stress-testing LLM and agent pipelines, with a strong focus on RAG and vector-database failure modes.
What WFGY is:
Why it may fit this list:
If you think this is in scope for the list, I am happy to prepare a PR that adds a concise one-line entry in the most appropriate section (probably under debugging / evaluation tools for AI agents).
Thanks again for maintaining the repo.