Skip to content

Commit 6b781eb

Browse files
unamedkrclaude
andcommitted
quantcpp 0.9.2: add Llama-3.2-1B to model registry
Model.from_pretrained("Llama-3.2-1B") auto-downloads the Q4_K_M GGUF (~750 MB) from hugging-quants on HuggingFace. Much better response quality than the 135M starter model — suitable for Reddit demos and first-impression showcases. README quick start now defaults to Llama-3.2-1B with SmolLM2 as a smaller alternative. Also adds quantcpp.available_models() helper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent a77fbe5 commit 6b781eb

3 files changed

Lines changed: 14 additions & 4 deletions

File tree

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,9 @@ pip install quantcpp
3535
```python
3636
from quantcpp import Model
3737

38-
# Downloads a small model automatically (~135 MB, one-time)
39-
m = Model.from_pretrained("SmolLM2-135M")
38+
# Downloads a model automatically (one-time, cached)
39+
m = Model.from_pretrained("Llama-3.2-1B") # ~750 MB, good quality
40+
# m = Model.from_pretrained("SmolLM2-135M") # ~135 MB, fastest download
4041
print(m.ask("What is gravity?"))
4142
```
4243

bindings/python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ build-backend = "setuptools.build_meta"
77

88
[project]
99
name = "quantcpp"
10-
version = "0.9.1"
10+
version = "0.9.2"
1111
description = "Single-header LLM inference engine with KV cache compression (7× compression at fp32 parity)"
1212
readme = "README.md"
1313
license = { text = "Apache-2.0" }

bindings/python/quantcpp/__init__.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
from importlib.metadata import version as _pkg_version
2020
__version__ = _pkg_version("quantcpp")
2121
except Exception:
22-
__version__ = "0.9.1" # fallback for editable / source-tree imports
22+
__version__ = "0.9.2" # fallback for editable / source-tree imports
2323

2424
import os
2525
import sys
@@ -53,8 +53,17 @@
5353
"smollm2-135m-instruct-q8_0.gguf",
5454
135,
5555
),
56+
"Llama-3.2-1B": (
57+
"hugging-quants/Llama-3.2-1B-Instruct-Q4_K_M-GGUF",
58+
"llama-3.2-1b-instruct-q4_k_m.gguf",
59+
750,
60+
),
5661
}
5762

63+
def available_models():
64+
"""List available model names for ``from_pretrained``."""
65+
return sorted(_MODEL_REGISTRY.keys())
66+
5867

5968
def _download_with_progress(url: str, dest: Path, desc: str) -> None:
6069
"""Download a file with a tqdm-free progress bar (stdlib only)."""

0 commit comments

Comments
 (0)