|
| 1 | +# Changelog |
| 2 | + |
| 3 | +All notable changes to QuantLLM are recorded here. The format follows |
| 4 | +[Keep a Changelog](https://keepachangelog.com/en/1.1.0/) and the project |
| 5 | +adheres to [Semantic Versioning](https://semver.org/). |
| 6 | + |
| 7 | +## [Unreleased] — production hardening on top of v2.1.0rc1 |
| 8 | + |
| 9 | +### Fixed |
| 10 | + |
| 11 | +- **`is_quantized` no longer lies about the loaded model state.** The |
| 12 | + attribute is now a derived property reading |
| 13 | + `model.config.quantization_config` (and BitsAndBytes layer types) at |
| 14 | + call time. This fixes three concrete bugs in v2.1.0rc1: |
| 15 | + * `from_config_only=True` previously left `_is_quantized=True` even |
| 16 | + though `AutoModelForCausalLM.from_config(...)` returns a random- |
| 17 | + weights model with no quantization. The flag is now `False` and a |
| 18 | + warning is emitted to make the random-weights nature explicit. |
| 19 | + * A missing `bitsandbytes` install used to silently fall through to |
| 20 | + full precision while keeping `_is_quantized=True`. We now log a |
| 21 | + descriptive warning and report `False`. |
| 22 | + * Pre-quantized HF repos that already shipped a `quantization_config` |
| 23 | + (GPTQ, AWQ, etc.) are now correctly reported as quantized regardless |
| 24 | + of the user's `quantize=False` flag. |
| 25 | +- **`DEFAULT_ARCHITECTURE_FALLBACKS` is now actually consulted.** The |
| 26 | + fallback table introduced by PR #27 was dead code whenever HF returned |
| 27 | + a non-empty `model_type` (i.e. always). `resolve_model_type` now |
| 28 | + checks the table directly and recognises common version-suffix |
| 29 | + patterns (`qwen3` → `qwen2`, `llama4` → `llama`, `phi4` → `phi3`, |
| 30 | + `gemma3` → `gemma2`, etc.). |
| 31 | +- **`register_architecture` class lookup now uses the natural API.** |
| 32 | + Calling `register_architecture("newmodel", base_model_type="llama", |
| 33 | + model_class=NewModel)` previously stored the class under `"newmodel"` |
| 34 | + but looked it up under `"llama"`, so the fallback path silently |
| 35 | + ignored it. The lookup now tries the original `config.model_type` |
| 36 | + first and falls back to the resolved base family. |
| 37 | +- Removed an accidentally duplicated `if is_bnb and is_8bit ...` block |
| 38 | + in the existing-quant detection branch of |
| 39 | + `TurboModel.from_pretrained`. |
| 40 | + |
| 41 | +### Added |
| 42 | + |
| 43 | +- **`TurboModel.is_quantized` public property** plus |
| 44 | + **`TurboModel.report()`** returning a structured dict (`model_id`, |
| 45 | + `params_billion`, `requested_bits`, `effective_loading_bits`, |
| 46 | + `is_quantized`, `quant_method`, `device`, `dtype`, `finetuned`, |
| 47 | + `lora_applied`). Use `report()` to assert programmatically what the |
| 48 | + loader actually produced. |
| 49 | +- **Pre-quantized repo detection.** Repository names matching |
| 50 | + `*-bnb-4bit`, `*-bnb-8bit`, `*-AWQ`, `*-GPTQ`, `*-INT4`, `*-INT8`, |
| 51 | + `*-FP8`, `*-EETQ`, `*-HQQ`, `*-AQLM` log a friendly hint that the |
| 52 | + embedded `quantization_config` will be honoured rather than |
| 53 | + re-quantized. |
| 54 | +- **GGUF-only repo hint.** When a name contains `-gguf` / `.gguf`, |
| 55 | + `from_pretrained` warns and points the user at `from_gguf`. |
| 56 | +- **Expanded `DEFAULT_ARCHITECTURE_FALLBACKS` table** covering Llama 2/3/4, |
| 57 | + Mistral / Mixtral, Qwen 2 / 2-MoE / 3, Phi / Phi-3 / Phi-4, Gemma / |
| 58 | + Gemma 2 / Gemma 3, Falcon, Cohere / Command-R, DeepSeek (V2/V3), |
| 59 | + OLMo / OLMo 2, SmolLM / SmolLM 2 / SmolLM 3, Yi, StarCoder / |
| 60 | + StarCoder 2, InternLM / InternLM 2, Baichuan, ChatGLM and StableLM. |
| 61 | +- **Real CI workflow** at `.github/workflows/ci.yml` running ruff, |
| 62 | + pytest on Python 3.10 / 3.11 / 3.12, and `python -m build` + |
| 63 | + `twine check` on every PR. |
| 64 | +- **`pyproject.toml`** providing PEP 517 / 518 build metadata, a |
| 65 | + conservative ruff lint profile and pytest defaults. |
| 66 | +- **`.pre-commit-config.yaml`** for local enforcement (whitespace, |
| 67 | + end-of-file fixer, large-file guard, ruff with autofix). |
| 68 | +- **`docs/guide/consumer-hardware.md`** documenting expected behaviour |
| 69 | + on every tier of consumer hardware (CPU-only, ≤ 8 GB VRAM, |
| 70 | + 12 – 24 GB, Apple Silicon, multi-GPU) and how to inspect the loaded |
| 71 | + state. |
| 72 | +- **Regression tests** for every fix above: |
| 73 | + * `tests/test_quantization_state.py` — runtime quantization state |
| 74 | + tracking, `from_config_only` honesty, `report()` schema. |
| 75 | + * `tests/test_resolve_model_type.py` — fallback table consultation, |
| 76 | + family-suffix matching, registry-class lookup ergonomics. |
| 77 | + |
| 78 | +### Changed |
| 79 | + |
| 80 | +- `TurboModel.__repr__` now reads from the new `is_quantized` property |
| 81 | + and degrades gracefully when `num_parameters()` is unavailable |
| 82 | + (mocked / lazily-loaded models). |
| 83 | +- `TurboModel.from_gguf` now sets `_is_quantized_override = True` |
| 84 | + rather than mutating an attribute the type system thought was a |
| 85 | + property -- this is functionally identical but more honest about the |
| 86 | + contract. |
| 87 | +- The "bitsandbytes not installed" warning now explains how to install |
| 88 | + it and explicitly states that loading falls back to full precision. |
| 89 | + |
| 90 | +## [2.0.0] — 2025-12-21 |
| 91 | + |
| 92 | +Initial public release of the `turbo()` API and the GGUF / ONNX / MLX |
| 93 | +export pipeline. See the GitHub |
| 94 | +[releases page](https://github.com/codewithdark-git/QuantLLM/releases/tag/v2.0.0) |
| 95 | +for the full notes. |
0 commit comments