llm-efficiency

Star

Here are 7 public repositories matching this topic...

AMD-AGI / Gumiho

Star

Official Implementation of "Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding" (ICML'25)

spd llm-inference llm-efficiency

Updated May 14, 2026
Python

umitkacar / llm-context-optimizer

Star

Biological code organization system with 1,029+ production-ready snippets - 95% token reduction for Claude/GPT with AI-powered discovery & offline packs

Updated Nov 10, 2025
Python

anshmajumdar121 / context-optimizer

Star

Reduce Claude AI token consumption. Zero install to start. Python 3.7+ for auto-manifest generation.

productivity mcp code-analysis developer-tools ai-assistant prompt-engineering anthropic claude-ai context-window model-context-protocol token-optimization llm-efficiency ai-cost-reduction

Updated May 13, 2026
HTML

VDADev2022 / token-diet

Star

Advanced token reduction and prompt optimization framework for LLMs, featuring linguistic, algorithmic, and architectural patterns.

ai nlp-resources ai-development llm prompt-engineering generative-ai llm-tools token-reduction token-usage llm-optimization context-management prompt-compression agentic-ai llm-efficiency claude-skills claude-skill ai-cost-savings

Updated Apr 25, 2026

wasim / scaling-specialization-dense-lms

Star

Do dense LMs develop MoE-like specialization as they scale? Measure it, visualize it, and turn it into speed.

transformers sparse-autoencoders scaling-laws mechanistic-interpretability llm-efficiency

Updated Oct 26, 2025
Python

augstentatious / UnSwagAI

Star

Packet-Switched Attention for stable 2-bit quantized MoE inference, with variance-aware routing and Protocol C benchmarks.

flax quantization pallas fine-tuning efficient-inference tpu jax xla mixture-of-experts memory-optimization sparse-attention opensource-ai gemma-2 llm-efficiency protocol-c

Updated May 14, 2026
Python

TokenCave is a browser extension for Claude AI that helps you monitor and optimize token usage with real-time counters, usage insights, and a “caveman mode” that dramatically reduces output length while preserving technical accuracy.

chrome-extension developer-tools browser-extension productivity-tools ai-utilities prompt-engineering claude-ai context-management token-optimization llm-efficiency

Updated Apr 18, 2026
JavaScript

Improve this page

Add a description, image, and links to the llm-efficiency topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-efficiency topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-efficiency

Here are 7 public repositories matching this topic...

AMD-AGI / Gumiho

umitkacar / llm-context-optimizer

anshmajumdar121 / context-optimizer

VDADev2022 / token-diet

wasim / scaling-specialization-dense-lms

augstentatious / UnSwagAI

PRATHAM777P / tokencave

Improve this page

Add this topic to your repo