Commit e557ddd
committed
feat: Add 12 new skills covering AI evaluation, model routing, and advanced RAG
Add comprehensive coverage for LLM evaluation, model routing/selection, and advanced RAG techniques:
LLM Evaluation (5 new skills):
- llm-benchmarks-evaluation.md (724 lines): MMLU, HellaSwag, BBH, HumanEval, TruthfulQA, GSM8K; lm-evaluation-harness, LightEval; data contamination detection
- llm-evaluation-frameworks.md (921 lines): Arize Phoenix (OpenTelemetry, self-hostable, LLM evals), Braintrust (86x faster search), LangSmith (LangChain integration), Langfuse (open-source)
- llm-as-judge.md (1089 lines): Pairwise/pointwise/reference-guided patterns, Prometheus 2 models (fine-tuned evaluators, BGB variant), G-Eval (GPT-4 with CoT), bias mitigation, uncertainty quantification
- rag-evaluation-metrics.md (969 lines): RAGAS metrics (Faithfulness, Answer Relevancy, Context Precision, Context Recall), LLM-as-judge for RAG, synthetic datasets, Arize Phoenix/Langfuse integration
- custom-llm-evaluation.md (1053 lines): Domain-specific metrics (medical, legal, code), RLHF reward models, adversarial testing (jailbreaks, prompt injection), bias/toxicity detection
Model Routing & Selection (3 new skills):
- llm-model-routing.md (562 lines): RouteLLM (ICLR 2025, 85% cost reduction), RoRF random forest, semantic routing (vLLM, ModernBERT), rule-based routing, model strengths (Claude 3.5 HumanEval 92%, GPT-4o MMLU 88.7%, Gemini Flash 370 tok/s, DeepSeek 27.4x cheaper)
- llm-model-selection.md (551 lines): 2025 model landscape (GPT-4o/o1, Claude 3.5/4, Gemini 2.5, Grok 3, DeepSeek R1/V3, LLaMA 3.3), capability matrix, pricing analysis (Premium $10-75, Mid $1-5, Budget $0.40-1 per million tokens), strategic stack approach
- multi-model-orchestration.md (721 lines): Pipeline/ensemble/specialist/cascade/hybrid patterns, context management, error handling with fallback chains, Arize Phoenix multi-model tracing with span analysis
Advanced RAG (4 new skills):
- hybrid-search-rag.md (656 lines): Vector + BM25 fusion, Reciprocal Rank Fusion (RRF), parallel/sequential architectures, score normalization (min-max, z-score, softmax), Elasticsearch/Weaviate/Qdrant/Pinecone, 15-30% improvement benchmarks
- rag-reranking-techniques.md (623 lines): Multi-stage retrieval (fast → rerank → generate), cross-encoder models (ms-marco, BGE), tensor-based reranking (ColBERT - 2024-2025 trend), LLM-as-reranker (GPT-4, Claude), Cohere Reranker API, nDCG/MAP/MRR metrics
- graph-rag.md (696 lines): Microsoft GraphRAG (2024), entity extraction, Leiden community detection, hierarchical summarization, local vs global queries, multihop reasoning, SAM-RAG/ArchRAG/LightRAG variants, Neo4j/ArangoDB, 72.5% comprehensiveness for global queries
- hierarchical-rag.md (694 lines): Multi-level document structures (chapter → section → paragraph), recursive summarization, parent-child chunks, top-down/bottom-up/hybrid retrieval, LlamaIndex/LangChain implementations, RAGAS hierarchical evaluation
Documentation Updates:
- skills/_INDEX.md: Added "LLM Evaluation & Routing (8 skills)" and "Advanced RAG (4 skills)" sections, updated totals (247 → 259 skills, 43 → 45 categories), added discovery patterns and quick reference entries
- README.md: Added LLM Evaluation, Model Routing, and Advanced RAG sections, updated technology coverage matrix (ML/AI: 21 → 33 skills), updated all skill counts
Key Technologies:
- Evaluation: Arize Phoenix, Braintrust, LangSmith, Langfuse, Prometheus 2, G-Eval, RAGAS
- Routing: RouteLLM, RoRF, vLLM Semantic Router, Unify
- RAG: Elasticsearch, Weaviate, Qdrant, Pinecone, Cohere Rerank, ColBERT, Neo4j, Microsoft GraphRAG
All skills include YAML frontmatter (agent-compatible), "Last Updated: 2025-10-26", code examples with 2024-2025 frameworks, anti-patterns, and related skills cross-references.1 parent 98edf73 commit e557ddd
14 files changed
Lines changed: 9322 additions & 9 deletions
File tree
- skills
- ml
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
125 | 125 | | |
126 | 126 | | |
127 | 127 | | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
128 | 131 | | |
129 | 132 | | |
130 | 133 | | |
| |||
417 | 420 | | |
418 | 421 | | |
419 | 422 | | |
420 | | - | |
| 423 | + | |
421 | 424 | | |
422 | 425 | | |
423 | 426 | | |
| |||
432 | 435 | | |
433 | 436 | | |
434 | 437 | | |
435 | | - | |
436 | | - | |
| 438 | + | |
| 439 | + | |
437 | 440 | | |
438 | 441 | | |
439 | 442 | | |
| |||
451 | 454 | | |
452 | 455 | | |
453 | 456 | | |
454 | | - | |
| 457 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
533 | 533 | | |
534 | 534 | | |
535 | 535 | | |
| 536 | + | |
| 537 | + | |
| 538 | + | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
536 | 575 | | |
537 | 576 | | |
538 | 577 | | |
| |||
852 | 891 | | |
853 | 892 | | |
854 | 893 | | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
855 | 897 | | |
856 | 898 | | |
857 | 899 | | |
| |||
1234 | 1276 | | |
1235 | 1277 | | |
1236 | 1278 | | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
| 1285 | + | |
1237 | 1286 | | |
1238 | 1287 | | |
1239 | 1288 | | |
| |||
1255 | 1304 | | |
1256 | 1305 | | |
1257 | 1306 | | |
1258 | | - | |
1259 | | - | |
| 1307 | + | |
| 1308 | + | |
1260 | 1309 | | |
1261 | 1310 | | |
1262 | 1311 | | |
| |||
1297 | 1346 | | |
1298 | 1347 | | |
1299 | 1348 | | |
1300 | | - | |
| 1349 | + | |
1301 | 1350 | | |
1302 | 1351 | | |
1303 | 1352 | | |
| 1353 | + | |
| 1354 | + | |
1304 | 1355 | | |
1305 | 1356 | | |
1306 | 1357 | | |
| |||
1337 | 1388 | | |
1338 | 1389 | | |
1339 | 1390 | | |
1340 | | - | |
| 1391 | + | |
1341 | 1392 | | |
0 commit comments