You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+20-6Lines changed: 20 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,6 +48,8 @@ This repository collects the best open-source tools and frameworks that make thi
48
48
|[MLAgentBench](https://github.com/snap-stanford/MLAgentBench)| Benchmark for evaluating AI agents on ML experimentation | 13 end-to-end ML tasks from CIFAR-10 to BabyLM |
49
49
|[AutoAgent](https://github.com/HKUDS/AutoAgent)| Zero-code LLM agent framework with self-play customization | Create agents via natural language, iterative self-improvement |
50
50
|[ShinkaEvolve](https://github.com/SakanaAI)| LLM-as-mutation-operator program evolution framework | Evolves programs for scientific discovery |
51
+
|[AI-Supervisor](https://arxiv.org/abs/2603.24402)| Autonomous research supervision via persistent Research World Model | Multi-agent consensus + Knowledge Graph; validates claims via GPU computation; self-correcting updates |
52
+
|[ARIS](https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep)| Lightweight Markdown-only skills for autonomous ML research overnight | Zero dependencies; cross-model review loops; 20+ GPU experiments per overnight run; works with any LLM agent |
51
53
52
54
## Agent-Driven Training Skills (HuggingFace Ecosystem)
53
55
@@ -99,6 +101,8 @@ This repository collects the best open-source tools and frameworks that make thi
99
101
|[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym)| Build RL environments for LLM training | Multi-step/multi-turn environments; interoperable with NeMo RL, OpenRLHF, TRL, Unsloth |
100
102
|[rLLM](https://github.com/rllm-org/rllm)| Post-training RL framework for language agents | Custom agents + environments → RL training → deployment; rLLM-FinQA-4B beats Qwen3-235B |
101
103
|[RAGEN](https://github.com/RAGEN-AI/RAGEN)| Multi-turn RL framework for training reasoning agents | StarPO framework; 10 built-in environments; identifies "Echo Trap" instability |
104
+
|[f-GRPO](https://github.com/rhaldarpurdue/f-GRPO)| f-Divergence based GRPO for general LLM alignment | KL/Reverse KL/Pearson/Hellinger/JS divergences; superior on both RLVR (math) and safety alignment; built on Unsloth |
105
+
|[Tree-GRPO](https://github.com/AMAP-ML/Tree-GRPO)| Tree search for LLM agent RL (ICLR 2026) | 4x less rollout budget via shared prefixes; step-wise process supervision from outcome reward; tree-structured ReAct |
102
106
|[SimpleRL-Reason](https://github.com/hkust-nlp/simpleRL-reason)| Simple RL recipe for reasoning (HKUST) | DeepSeek-R1-style; 7B achieves 33.3% AIME with only 8K examples; no SFT needed |
103
107
|[SWE-RL](https://github.com/facebookresearch/swe-rl)| Meta's RL for software engineering reasoning | Llama3-SWE-RL-70B achieves 41% on SWE-bench Verified (NeurIPS 2025) |
104
108
|[OpenManus-RL](https://github.com/OpenManus/OpenManus-RL)| RL tuning for LLM agents (UIUC + MetaGPT) | PPO-based; AgentGym environments + verl training |
@@ -144,6 +148,7 @@ This repository collects the best open-source tools and frameworks that make thi
144
148
|[InstructLab SDG](https://github.com/instructlab/sdg)| Synthetic data via LAB methodology (IBM/Red Hat) | Skills-SDG + Knowledge-SDG; minimal seed taxonomy → large-scale data |
145
149
|[Persona Hub](https://github.com/tencent-ailab/persona-hub)| Persona-driven synthetic data at billion scale (Tencent) | 1B diverse personas; 370M elite personas released |
146
150
|[synth_gen](https://github.com/facebookresearch/synth_gen)| Execution-verified synthetic data (Meta) | Modular verifier system; parser-based verification for code |
151
+
|[Evidently](https://github.com/evidentlyai/evidently)| Open-source synthetic data generation with user profiles | Model-agnostic; customizable personas & goals; no-code UI in Evidently Cloud; outputs to pandas DataFrame |
147
152
|[NVIDIA Nemotron-4 340B](https://blogs.nvidia.com/blog/nemotron-4-synthetic-data-generation-llm-training/)| Open models for synthetic data generation pipeline | Base + Instruct + Reward models; commercial use allowed |
148
153
149
154
### Data Curation & Filtering
@@ -184,6 +189,8 @@ This repository collects the best open-source tools and frameworks that make thi
|[AutoRound](https://github.com/intel/auto-round)| Advanced quantization via sign-gradient descent (Intel) | High accuracy at 2-4 bits; exports to GPTQ/AWQ/GGUF; broad HW compatibility |
192
+
|[NVIDIA Model Optimizer](https://github.com/NVIDIA/Model-Optimizer)| Unified quantization, pruning, distillation & speculative decoding | FP8/INT8/INT4; exports to TensorRT-LLM/vLLM; NeMo Megatron integration |
193
+
|[TurboQuant](https://research.google/blog/turboquant-redefining-ai-efficiency-with-extreme-compression/)| Google's KV cache compression (ICLR 2026) | 6x memory reduction at 3-bit with zero accuracy loss; PolarQuant + QJL; 8x perf on H100 |
187
194
|[llama.cpp](https://github.com/ggml-org/llama.cpp)| LLM inference in C/C++ with GGUF quantization | Q4_K_M sweet spot: 92% quality, 75% size reduction; runs everywhere |
188
195
189
196
## Lightweight Pretraining & Distributed Training
@@ -214,6 +221,7 @@ This repository collects the best open-source tools and frameworks that make thi
214
221
|[SGLang](https://github.com/sgl-project/sglang)| High-performance serving for LLMs & multimodal |~16,200 tok/sec on H100; RadixAttention; used by slime for RL training |
@@ -250,6 +258,7 @@ This repository collects the best open-source tools and frameworks that make thi
250
258
|[MLRC-Bench](https://openreview.net/forum?id=t8Okk2PRWU)| ML Research Competition challenges | Tests novel methodology development |
251
259
|[AgentBench](https://github.com/THUDM/AgentBench)| Multi-dimensional benchmark for LLM agents | Tests across OS, database, knowledge graph, web, and game environments |
252
260
|[SWE-bench Verified](https://www.swebench.com/)| Human-verified GitHub issue resolution | Industry standard for coding agents; top scores 70%+ |
|[SERA](https://huggingface.co/collections/allenai/open-coding-agents)| Ai2's open coding agent family | 54.2% on SWE-Bench; trains in 40 GPU-days (~$2K); all open |
284
+
|[Cline](https://github.com/cline/cline)| VS Code AI coding agent with 60K+ GitHub stars | MCP tool creation; 5M+ developers; human-in-the-loop approval; native subagents |
@@ -310,18 +322,20 @@ Generate data at scale → train efficiently → evaluate comprehensively.
310
322
311
323
---
312
324
313
-
## Trends (2026)
325
+
## Trends (2026 Q2 Update)
314
326
315
-
1.**AutoResearch Paradigm**: Karpathy proved "AI autonomously doing ML research" works with just 630 lines of code
327
+
1.**AutoResearch Paradigm**: Karpathy proved "AI autonomously doing ML research" works with just 630 lines of code — now spawning derivatives like ARIS and AI-Supervisor
316
328
2.**"Vibe Training"**: HF Skills enables natural-language-driven model training lifecycle
317
-
3.**GRPO > PPO**: DeepSeek's GRPO is becoming the default alignment method (no critic model, simpler, more stable)
329
+
3.**GRPO Variants Proliferate**: f-GRPO (f-divergence family), Tree-GRPO (tree search, ICLR 2026), DAPO — GRPO is the new default, and specialized variants are emerging fast
318
330
4.**RL Framework Explosion**: verl, DAPO, AReaL, slime — every major lab now has an open-source RL training framework
6.**Synthetic Data as Infrastructure**: Distilabel, Magpie, Cosmopedia make data generation a first-class pipeline stage
332
+
6.**Synthetic Data as Infrastructure**: Distilabel, Magpie, Evidently make data generation a first-class pipeline stage; model collapse mitigation (Evol-Instruct) becoming standard
321
333
7.**MCP Standardization**: Model Context Protocol adopted by OpenAI/Google/Microsoft as the "USB-C for AI agents"
322
334
8.**Single-GPU Research**: Unsloth + nanochat + AutoResearch enables individual developers to do serious LLM research
323
-
9.**Inference-Training Convergence**: vLLM/SGLang are now core components of RL training loops, not just serving
335
+
9.**Inference-Training Convergence**: vLLM/SGLang/TGI are now core components of RL training loops, not just serving
324
336
10.**Multimodal RL**: LLaVA-OneVision-1.5-RL and OpenRLHF-M bring RL alignment to vision-language models
337
+
11.**Extreme Quantization**: Google TurboQuant achieves 6x KV cache compression at zero accuracy loss (ICLR 2026); NVIDIA Model Optimizer unifies quantization/pruning/distillation
338
+
12.**Multi-Agent Coding Wave**: Feb 2026 saw every major tool ship multi-agent capabilities (Grok Build, Windsurf, Claude Code, Codex CLI, Devin) — coding agents now routinely write training scripts
325
339
326
340
---
327
341
@@ -352,4 +366,4 @@ This curated list is released under [CC0 1.0](https://creativecommons.org/public
352
366
353
367
---
354
368
355
-
*Compiled March 2026. Project statuses may change — check individual GitHub repos for the latest.*
369
+
*Compiled March 2026, updated April 2026. Project statuses may change — check individual GitHub repos for the latest.*
0 commit comments