Hi, thanks for maintaining this collection of machine learning lists.
I would like to suggest one project that might fit under the LLM / RAG / evaluation or debugging related resources:
WFGY is a text-only framework that you feed into any strong LLM. It tries to treat LLM pipelines as systems that can be debugged, not just prompted.
A few parts that might be relevant to your readers:
-
WFGY 2.0 ProblemMap (RAG and LLM failure modes)
https://github.com/onestardao/WFGY/tree/main/ProblemMap
- 16 canonical failure modes for RAG and LLM applications (hallucination, chunk drift, retrieval collapse, entropy collapse, bootstrap ordering, deployment deadlock, etc.).
- Each problem has its own page with failure patterns, diagnosis procedure and proposed fixes.
- Used as a checklist when people wonder “is this a vector store problem, a prompt problem, or a data problem”.
-
WFGY 1.0 technical report (math + structure in prompts)
PDF inside the repo with a more formal description of the math-in-prompt idea and quantitative benchmarks.
-
WFGY 3.0 Singularity Demo (advanced, for researchers)
A TXT pack with 131 “S-class” tension-based questions for long-horizon reasoning and evaluation. This is more experimental, but some researchers use it as a stress test for LLM reasoning.
If you feel this is in scope, I would be happy to open a PR and place it in the section you think fits best.
Hi, thanks for maintaining this collection of machine learning lists.
I would like to suggest one project that might fit under the LLM / RAG / evaluation or debugging related resources:
WFGY is a text-only framework that you feed into any strong LLM. It tries to treat LLM pipelines as systems that can be debugged, not just prompted.
A few parts that might be relevant to your readers:
WFGY 2.0 ProblemMap (RAG and LLM failure modes)
https://github.com/onestardao/WFGY/tree/main/ProblemMap
WFGY 1.0 technical report (math + structure in prompts)
PDF inside the repo with a more formal description of the math-in-prompt idea and quantitative benchmarks.
WFGY 3.0 Singularity Demo (advanced, for researchers)
A TXT pack with 131 “S-class” tension-based questions for long-horizon reasoning and evaluation. This is more experimental, but some researchers use it as a stress test for LLM reasoning.
If you feel this is in scope, I would be happy to open a PR and place it in the section you think fits best.