Skip to content

Suggestion: add WFGY as an open source debugging and evaluation framework for LLM agents #41

@onestardao

Description

@onestardao

Hi, thank you for putting together this database of SDKs, frameworks, and tools for AI agents.

I would like to suggest adding WFGY, which is an open source framework for debugging, evaluating, and stress-testing LLM and agent pipelines, with a strong focus on RAG and vector-database failure modes.

What WFGY is:

  • A text-driven debugging framework, not a hosted SaaS. It runs on top of any GPT-4-class model and guides the model through a structured seven-step reasoning and diagnosis flow.
  • A set of 16 named failure patterns for RAG and LLM apps (retrieval quality, vectorstore index issues, prompt injection, deployment gaps, etc.) with practical “fix playbooks”.
  • A TXT evaluation pack (WFGY 3.0 – Singularity Demo) with 131 long-horizon stress-test problems that can be loaded into any agent environment to probe reasoning, robustness, and hallucination behaviour.

Why it may fit this list:

  • The repo explicitly includes frameworks, libraries, and tools for creating, monitoring, debugging, and deploying agents. WFGY sits in the “debugging / evaluation / observability” space, similar in spirit to entries like LangSmith or Langfuse, but implemented as an open source text pack rather than a SaaS product.
  • It is provider-agnostic and can be wrapped into existing SDKs or toolchains by simply loading the TXT pack and following the documented evaluation flow.
  • The project is MIT-licensed and has been actively maintained and used in real debugging work, so people can integrate it or fork it for their own stacks.

If you think this is in scope for the list, I am happy to prepare a PR that adds a concise one-line entry in the most appropriate section (probably under debugging / evaluation tools for AI agents).

Thanks again for maintaining the repo.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions