Summary
The project now has a clearer strategic split:
- Native vLLM serving path should converge on upstream
vllm-project/vllm#38479
- turboquant-vllm should become the reference implementation for:
- HuggingFace
DynamicCache workflows
- model verification / policy tooling
- multimodal + heterogeneous architecture validation
- incubation of upstreamable TurboQuant ideas
This issue tracks the repo-level repositioning and the next execution slices.
Why
Recent repo/docs state and validation work point in the same direction:
docs/index.md, docs/ARCHITECTURE.md, and docs/ROADMAP.md already say the native vLLM serving roadmap is superseded by upstream PR #38479
- The repo remains strongest where upstream is narrower or not optimized:
- HuggingFace compression workflows
- verify CLI / model compatibility checks
- multimodal models (Molmo2)
- heterogeneous / shared-KV / sliding-window architectures (Gemma 3/4)
- experimental algorithm work (e.g. WHT / Hadamard, norm correction backports)
The goal is to stop treating the CUSTOM backend as the main long-term product and instead treat it as optional bridge / staging infrastructure.
Strategic decision
Primary identity for turboquant-vllm:
- HuggingFace reference implementation for TurboQuant KV compression
- Verification + policy engine for deciding whether/how TQ should be used on a model
- Multimodal / weird-architecture validation lab
- Incubator for ideas that may later be upstreamed into vLLM
Secondary identity:
- Optional compatibility bridge for vLLM via plugin path when useful
Non-goal:
- Competing long-term with upstream native vLLM TurboQuant as a parallel production serving stack
Proposed work breakdown
Phase 1 — messaging + product shape
Phase 2 — verification / policy tooling
Phase 3 — algorithm incubation
Phase 4 — multimodal + architecture moat
Candidate follow-up issues
docs: reposition project around HF reference + validation workflow
feat(verify): architecture-aware compatibility and policy recommendations
research(rotation): prototype WHT/Hadamard rotation path
docs: add decision guide for upstream native vLLM vs turboquant-vllm
Success criteria
- New users understand this repo's role in under 2 minutes
- The verify CLI answers "will TQ work on my model and how should I use it?"
- New architecture work lands here first, then only validated pieces move upstream
- Plugin/backend work becomes optional support infrastructure, not the strategic center
Summary
The project now has a clearer strategic split:
vllm-project/vllm#38479DynamicCacheworkflowsThis issue tracks the repo-level repositioning and the next execution slices.
Why
Recent repo/docs state and validation work point in the same direction:
docs/index.md,docs/ARCHITECTURE.md, anddocs/ROADMAP.mdalready say the native vLLM serving roadmap is superseded by upstream PR #38479The goal is to stop treating the CUSTOM backend as the main long-term product and instead treat it as optional bridge / staging infrastructure.
Strategic decision
Primary identity for
turboquant-vllm:Secondary identity:
Non-goal:
Proposed work breakdown
Phase 1 — messaging + product shape
Phase 2 — verification / policy tooling
python -m turboquant_vllm.verifyfrom "cosine checker" to architecture-aware advisorPhase 3 — algorithm incubation
Phase 4 — multimodal + architecture moat
Candidate follow-up issues
docs: reposition project around HF reference + validation workflowfeat(verify): architecture-aware compatibility and policy recommendationsresearch(rotation): prototype WHT/Hadamard rotation pathdocs: add decision guide for upstream native vLLM vs turboquant-vllmSuccess criteria