docs(readme): reposition project around HF reference workflows#91
docs(readme): reposition project around HF reference workflows#91Alberto-Codes wants to merge 3 commits intomainfrom
Conversation
Clarify that turboquant-vllm is primarily the HuggingFace/reference path for DynamicCache compression, verification, and architecture research while the out-of-tree vLLM plugin remains an optional bridge. - reframe README and docs index around HF/reference usage - de-emphasize plugin-first messaging in vLLM and container docs - highlight verify CLI and upstream native vLLM split Refs #87
CI lint failed in uv-secure because the lockfile still resolved cryptography 46.0.6, which now reports GHSA-p423-j2cm-9vmq. - upgrade cryptography in uv.lock to 46.0.7 - re-run uv-secure and lint-related local checks - keep the docs change set otherwise unchanged Refs #87
There was a problem hiding this comment.
Pull request overview
Repositions repository documentation to emphasize the HuggingFace DynamicCache reference workflow (plus verification/research) as the primary project direction, while keeping the vLLM plugin path documented as an optional bridge.
Changes:
- Reframes the README to lead with HuggingFace/reference + verification workflows, and moves the vLLM plugin to an “optional bridge” section.
- Updates usage docs (vLLM/plugin, HuggingFace, container) to de-emphasize plugin-first messaging and clarify the upstream native vLLM split.
- Updates
uv.lockto bumpcryptographyfrom 46.0.6 to 46.0.7.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Updates locked dependency version for cryptography. |
| README.md | Repositions project messaging around HF/reference workflows and verification; keeps plugin usage as optional. |
| docs/site/usage/vllm.md | Clarifies plugin as an optional out-of-tree bridge and points to upstream native vLLM direction. |
| docs/site/usage/huggingface.md | Labels HuggingFace DynamicCache as the primary workflow. |
| docs/site/usage/container.md | Reframes container instructions as being for the optional plugin bridge. |
| docs/index.md | Updates the repo docs index to highlight verification + HF/reference path and add a “choose the right path” guide. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Or with [uv](https://docs.astral.sh/uv/): | ||
|
|
||
| ```bash | ||
| TQ4_K_BITS=4 TQ4_V_BITS=3 vllm serve meta-llama/Llama-3.1-8B-Instruct --attention-backend CUSTOM | ||
| uv add turboquant-vllm | ||
| uv add turboquant-vllm --extra vllm | ||
| ``` |
There was a problem hiding this comment.
The uv install snippet currently shows two uv add commands back-to-back (base + --extra vllm), which reads like both should be run. This is redundant/confusing; it should be presented as alternative commands (either base install or install with the vllm extra) so users don’t add the dependency twice or wonder which one is correct.
There was a problem hiding this comment.
Fixed. README now labels the two uv commands as alternatives and explains which workflow each command is for, so users do not read them as sequential steps.
| ## Choose the Right Path | ||
|
|
||
| - **Use `turboquant-vllm`** when you want HuggingFace cache compression, model validation, multimodal experiments, or architecture/policy research | ||
| - **Use upstream native vLLM TurboQuant** when you want the in-tree serving path in vLLM | ||
| - **Use the plugin path here** only when you specifically need the out-of-tree bridge (`--attention-backend CUSTOM`) | ||
|
|
There was a problem hiding this comment.
This PR is described as repositioning the docs landing/index, but the rendered MkDocs site uses docs/site as docs_dir and its landing page is docs/site/index.md (see mkdocs.yml). Updates here in docs/index.md won’t affect the site landing page, so the public docs may still present the old plugin-first messaging unless the MkDocs index is updated too or this file is explicitly linked from the site.
There was a problem hiding this comment.
Fixed. I updated docs/site/index.md, which is the MkDocs landing page configured by mkdocs.yml, so the public docs now lead with the HuggingFace/reference positioning instead of the old plugin-first messaging.
This repositions turboquant-vllm as the HuggingFace/reference path for DynamicCache compression, verification, and architecture research instead of presenting the out-of-tree vLLM plugin as the main long-term product direction.
Test:
uv run python -m compileall srcCloses #87
PR Review
Checklist
uv run pytest)uv run ruff check .)!in title andBREAKING CHANGE:in bodyReview Focus
Related