GGUF type-ID contract — v2

Authoritative assignments for enum ggml_type and enum llama_ftype in this fork. Every cherry-pick from a contributing fork MUST renumber to match this table before landing on a branch.

This document is normative. If reality disagrees with this document, fix the code, not the document — unless the contract itself is being revised, in which case bump the version at the top and call out the change.

Why this exists

Five contributing forks have independently extended ggml_type in mutually incompatible ways. Concrete known collisions:

Slot	Mainline	TheTom (TQ-KV HEAD)	TheTom (alpha-scaling, stale)	Buun (master)	Carlosfundora (1-bit-turbo)	Turbo-tan (main)	ik_llama (main)
41	`Q1_0`	`Q1_0` (mainline-aligned)	`TURBO3_0`	`Q1_0`	(gap)	`Q1_0`	`Q1_0_G128`
42	—	`TURBO2_0`	`TURBO4_0`	`TURBO3_0`	`Q1_0`	—	—
43	—	`TURBO3_0`	`TURBO2_0`	`TURBO4_0`	`Q1_0_g128` (removed; slot returned to mainline reserve)	—	—
44	—	`TURBO4_0`	`TQ3_1S`	`TURBO2_0`	`PLANAR3_0`	`TQ3_1S` (different layout from TheTom's)	—
45	—	`TQ3_1S`	`TQ4_1S`	`TURBO3_TCQ`	`PLANAR4_0`	—	—
46	—	`TQ4_1S`	—	`TURBO2_TCQ`	`ISO3_0`	`TQ3_4S`	—
47	—	—	—	—	`ISO4_0`	—	—

Note (2026-05-12, recon/06): TheTom HEAD branch is feature/turboquant-kv-cache; alpha-scaling is now stale and superseded. Between alpha-scaling and TQ-KV, the TURBO_*_0 trio was reordered (TURBO2/3/4 = 42/43/44 instead of 43/41/42) and the TQ_1S types shifted up by one slot. This affects the cherry-pick recipe's "FROM" mapping but not this fork's own slot assignments (60–95 zone unchanged).

GGUF files produced by any one fork are silently misread by any other. Cherry-picking without renumbering would propagate this hazard into this fork.

Partitioning policy

ggml_type is a uint32_t-valued enum. We partition the address space into fixed-purpose zones:

Range	Purpose	Owner
0–41	Mainline core types	upstream ggml-org/llama.cpp
42–59	Mainline growth reserve — DO NOT USE	upstream (future)
60–95	Fork extensions — new types from contributing forks	this project
96–199	ik_llama compatibility zone	preserve ik_llama assignments
200–255	Row-interleaved / packed variants	preserve ik_llama R-suffix layout

Why a mainline growth reserve. Mainline added Q1_0 at slot 41 after several forks had already placed their own types at 41. Mainline will keep adding types in this range. This fork refuses to play tug-of-war for these slots. We accept whatever mainline assigns; we never compete.

Why a high-number zone (60–95). Same reason ik_llama did it: collisions with mainline's next 10 additions become impossible.

Why preserve ik_llama's 96+ assignments. Pragmatic: ik_llama GGUFs are the most numerous fork-quantized files in the wild. Renumbering them would require either (a) a loader compatibility shim or (b) forcing users to re-quantize. Preserving the IDs lets us read existing ik_llama GGUFs without modification.

Fork extension zone (60–95) — canonical assignments

60–65: TurboQuant KV family (source: TheTom `feature/turboquant-kv-cache`)

Source-fork canonical branch confirmed by recon recon/06-thetom-branches.md (2026-05-12). Earlier drafts of this document named feature/alpha-scaling as the source; alpha-scaling has since been superseded by TQ-KV (1 substantive unique commit, the optional TURBO_ALPHA env var knob). All "TheTom name (renamed)" slot numbers below reflect TQ-KV HEAD as of 5aeb2fdbe.

Slot	Type name	TheTom name (renamed)	Description
60	`GGML_TYPE_TURBOQ2_0`	`TURBO2_0` (42)	2-bit PolarQuant, no QJL
61	`GGML_TYPE_TURBOQ3_0`	`TURBO3_0` (43)	2-bit PolarQuant + 1-bit QJL
62	`GGML_TYPE_TURBOQ4_0`	`TURBO4_0` (44)	4-bit PolarQuant (default `TURBOQ4_USE_4BIT=1`; legacy 3-bit+QJL mode available via `TURBOQ4_USE_4BIT=0`)
63	`GGML_TYPE_TURBOQ8_0`	buun `TURBO8_0`	8-bit KV: FWHT + uniform 256-level grid (`centroid[i]=(i-127.5)/127.5`) + per-block absmax, no QJL, no PolarQuant codebook. CLI string `turboq8`; block `block_turboq8_0` = 130 bytes (fp16 absmax + 128×uint8), 8.125 bpw. CPU + CUDA/HIP fattn-vec; no Vulkan kernel yet.
64	`GGML_TYPE_TURBOQ5_0`	ygg (TODO 250)	5-bit KV: FWHT + uniform 32-level grid (`centroid[i]=(i-15.5)/15.5`) + per-block absmax, no QJL, no PolarQuant codebook. Extends `turboq8` design; `q5_0`-style index split (low nibble in `qs`, high 1 bit in `qh`). CLI string `turboq5`; block `block_turboq5_0` = 82 bytes, 5.125 bpw. CPU + CUDA/HIP fattn-vec; no Vulkan yet. See features/turboquant-hibit-kv.md.
65	`GGML_TYPE_TURBOQ6_0`	ygg (TODO 250)	6-bit KV: FWHT + uniform 64-level grid (`centroid[i]=(i-31.5)/31.5`) + per-block absmax, no QJL, no PolarQuant codebook. Extends `turboq8` design; `q6_K`-style index split (low nibble in `qs`, high 2 bits in `qh`). CLI string `turboq6`; block `block_turboq6_0` = 98 bytes, 6.125 bpw. CPU + CUDA/HIP fattn-vec; no Vulkan yet. See features/turboquant-hibit-kv.md.

Slot-65 reassignment (2026-06-22, TODO 250). Slot 65 was previously a doc-only reservation for GGML_TYPE_TURBOQ3_NATIVE (turbo-tan TQ3_0, 200). That type was never landed in ggml/include/ggml.h (zero code references fork-wide), so slot 65 was free in the enum and is now assigned to GGML_TYPE_TURBOQ6_0. If turbo-tan TQ3_0 is ported later it must take a fresh free slot, not 65.

Symbol prefix: turboq_ (kernels), TURBOQ_ (constants). The Q suffix disambiguates from the TURBO*_0 collisions in contributing forks.

66–71: TCQ + InnerQ KV family (source: buun `master` (TCQ) / TheTom (InnerQ))

Slot	Type name	Source name (renamed)	Description	Block size	Vulkan
66	`GGML_TYPE_TURBOQ2_TCQ`	`TURBO2_TCQ` (buun 46)	TCQ k=2, L=8, 256 states	36 bytes	CPU fallback
67	`GGML_TYPE_TURBOQ3_TCQ`	`TURBO3_TCQ` (buun 45)	TCQ k=3, Viterbi-decoded	52 bytes	CPU fallback
68	`GGML_TYPE_TURBOQ2_INNERQ`	`TURBO2_INNERQ` (ft2 67)	2-bit + InnerQ per-channel equalization; block_turboq2_0	34 bytes	CPU fallback (no .comp shaders in ft2)
69	`GGML_TYPE_TURBOQ3_INNERQ`	`TURBO3_INNERQ` (ft2 68)	3-bit + InnerQ per-channel equalization; block_turboq3_0	50 bytes	CPU fallback
70	(retired/reserved)	—	Was `GGML_TYPE_TURBOQ4_INNERQ`: 4-bit InnerQ alias of TURBOQ4_0; InnerQ equalization regresses quality at 4-bit (PPL 9.08 vs 7.47, ft2 ccfe39d675). Slot permanently retired — do not reuse.	—	—
71	`GGML_TYPE_KV_OSCAR_INT2`	new — OScaR Phase 1	FHT + per-block min-max uniform INT2 (arXiv:2605.19660); Phase 1 CUDA-only; Phase 2 adds F16 residual window (R=128) + hybrid-memory-chain constructor-chain propagation	36 bytes	CPU fallback

Note: InnerQ is K-cache runtime quantization only (not weight quantization); calibration state is per-session, no GGUF persistence.

Symbol prefix: turboq_tcq_ (TCQ), turboq_innerq_ (InnerQ). TCQ extends the TurboQuant family conceptually but uses Viterbi-coded trellises instead of scalar codebooks. InnerQ applies per-channel K-cache equalization before WHT rotation; wire format identical to the corresponding TURBOQ_0 block structs.

72–79: Reserved (formerly RotorQuant KV family — removed)

Slots 72–75 previously held the RotorQuant KV family (RQ_PLANAR3_0, RQ_PLANAR4_0, RQ_ISO3_0, RQ_ISO4_0) ported from carlosfundora 1-bit-turbo. The family was removed (55bb0d418): ISO3_0 was strictly dominated (+23.5% PPL vs comparable TurboQ types), and all four types were zero-rotation scalar duplicates with no recoverable advantage at identical or lower bpw.

Slot	Status
72	reserved (formerly `GGML_TYPE_RQ_PLANAR3_0`; removed)
73	reserved (formerly `GGML_TYPE_RQ_PLANAR4_0`; removed)
74	reserved (formerly `GGML_TYPE_RQ_ISO3_0`; removed)
75	reserved (formerly `GGML_TYPE_RQ_ISO4_0`; removed)
76–79	reserved

80–85: WHT weight family (source: TheTom `feature/turboquant-kv-cache`)

Originally drafted against pr/tq4-weight-compression; that branch is fully subsumed by feature/turboquant-kv-cache (zero unique commits by subject — see recon/06-thetom-branches.md). All slot numbers below reflect TQ-KV HEAD as of 5aeb2fdbe.

Slot	Type name	TheTom name (renamed)	Description
80	`GGML_TYPE_WHT3_0`	`TQ3_1S` (45)	WHT-rotated 8-level Lloyd-Max, block_size=32
81	`GGML_TYPE_WHT4_0`	`TQ4_1S` (46)	WHT-rotated 16-level Lloyd-Max, block_size=32
82	`GGML_TYPE_WHT5_0`	— (yggdrasil extension)	WHT-rotated 32-level Lloyd-Max, block_size=32 (6.0 bpw)
83	`GGML_TYPE_WHT6_0`	— (yggdrasil extension)	WHT-rotated 64-level Lloyd-Max, block_size=32 (7.0 bpw)
84	`GGML_TYPE_WHT8_0`	— (yggdrasil extension)	WHT-rotated 256-level Lloyd-Max, block_size=32 (9.0 bpw)
85	reserved		future WHT variant

Symbol prefix: wht_. The TQ prefix in TheTom's naming collided with turbo-tan's RaBitQ TQ3 family; renaming to WHT reflects the actual transform (Walsh-Hadamard) and breaks the collision.

WHT5_0/WHT6_0/WHT8_0 (slots 82/83/84, 2026-06-22): yggdrasil extensions of TheTom's WHT lineage to wider Lloyd-Max codebooks (no upstream TheTom counterpart — the rotation + dual-half-scale block design and quantizer are TheTom's, the 5/6/8-bit codebooks and index packings are new). FTYPEs MOSTLY_WHT5_0=59, MOSTLY_WHT6_0=60, MOSTLY_WHT8_0=61. Status: functional (CPU + CUDA/HIP dequant→cuBLAS path); fused mmvq + Vulkan deferred. Credit: TheTom (WHT method).

86–91: RaBitQ weight family (source: turbo-tan `main`)

Slot	Type name	Turbo-tan name (renamed)	Description
86	`GGML_TYPE_RBQ3_1S`	`TQ3_1S` (44)	RaBitQ 3-bit, two half-block scales
87	`GGML_TYPE_RBQ3_4S`	`TQ3_4S` (46)	RaBitQ 3-bit, four u8 per-8 scales (4.0 bpw)
88–91	reserved		future RaBitQ variants

Symbol prefix: rbq_. Disambiguates from TheTom's WHT family.

92–95: unanticipated weight quants

Slot	Name	Source	Notes
92	`GGML_TYPE_WQ3_TCQ`	buun `feat/tcq-wq3-ffn-fusion`	TurboQuant 3-bit weight quant: TCQ (k=3, L=9, 512 states) + FWHT rotation. Re-slotted from buun's upstream `46` to avoid a mid-enum renumber of our relocated KV types. GPU-only dequant; reuses the 52-byte `block_turboq3_tcq` layout (128 elems, 3.25 bpv). CUDA-first (Ph1); CPU/HIP/Vulkan + quantizer in Ph2–4. See `docs/features/wq3-tcq.md`.
93–95	reserved		future weight quant extensions

WQ3_TCQ landed here (not the 80–85 WHT zone) because it is neither a WHT nor a RaBitQ variant — it is a trellis-coded (TCQ) weight quant, the first of its kind, so it takes the dedicated unanticipated-weight reserve. This also keeps the in-flight WHT5/6/8 (83–85) reservations free.

ik_llama compatibility zone (96–199) — preserved IDs

ik_llama's chosen IDs are preserved verbatim. Renumbering would break existing ik_llama-quantized GGUFs. The full list of preserved assignments:

Slot	Name	Source
97	`Q8_0_X4`	interleaved 8-bit, ×4 packing
98	`Q8_1_X4`	interleaved 8-bit (signed-bias), ×4 packing
99	`Q8_2_X4`	interleaved 8-bit (variant), ×4 packing
133	`Q6_0`	revived legacy format
134	`IQ1_BN`	BitNet 1-bit
135	`IQ2_BN`	BitNet 2-bit
136	`Q8_K64`	K-quant with 64-element blocks
137	`IQ2_K`	IQK 2-bit imatrix-aware weight quant (2.375 bpw) — Phase 5b-1a; ygg canonical
138	`IQ3_K`	IQK 3-bit imatrix-aware weight quant (3.44 bpw) — Phase 5b-1a; ygg canonical
139	`IQ4_K`	IQK 4-bit imatrix-aware weight quant (4.50 bpw) — Phase 5b-1a; ygg canonical
140	`IQ5_K`	IQK 5-bit imatrix-aware weight quant — Phase 5b-2 S1 `f7a489de5`; ygg canonical
141	`IQ6_K`	IQK 6-bit imatrix-aware weight quant — Phase 5b-2 S1 `f7a489de5`; ygg canonical
144	`IQ4_KS`	IK-quant small; row_meta=4 bytes (float row-scale) — Phase 5b-1b; ygg canonical
145	`IQ2_KS`
146	`IQ4_KSS`	IK-quant small-small; row_meta=4 bytes — Phase 5b-1b; ygg canonical
147–151	`Q8_K16`, `Q8_K32`, `Q8_KR8`, `Q8_K128`, `Q8_KV`	Q8 K-block variants
152	`IQ5_KS`
153–154	`IQ2_KT`, `IQ3_KT`	trellis weight quants (dormant; preserve IDs)
155	`IQ4_KT`	IK trellis 4-bit weight quant; row_meta=4 bytes — Phase 5b-1b; ygg canonical (differs from buun TCQ: IQ4_KT is a weight quant, TCQ is a KV-cache quant)
156	`IQ3_KS`	IK-quant small 3-bit; row_meta=2 bytes (uint16_t half-row-scale) — Phase 5b-1b; ygg canonical
157	`IQ2_KL`	IQK 2-bit low-bpw (2.6875 bpw) imatrix-aware weight quant — Phase 5b-1c S1 `e404274b9`; ygg canonical
158	`IQ1_KT`	trellis 1-bit (dormant)

Row-interleaved / packed variants (200–255)

Preserve ik_llama's R-suffix layout verbatim:

Slot	Name
202	`Q4_0_R8`
206	`Q5_0_R4`
208	`Q8_0_R8`
210–214	`Q2_K_R4`, `Q3_K_R4`, `Q4_K_R4`, `Q5_K_R4`, `Q6_K_R4`
216–223	`IQ2_XXS_R4`, `IQ2_XS_R4`, `IQ3_XXS_R4`, `IQ1_S_R4`, `IQ4_NL_R4`, `IQ3_S_R4`, `IQ2_S_R4`, `IQ4_XS_R8`
229	`IQ1_M_R4`
230	`BF16_R16`

Slots 200–201, 203–205, 207, 209, 215, 224–228, 231–255 are reserved for future packed-variant additions.

Turbo-tan's TQ3_0 = 200 (KV-cache only) was tentatively earmarked for the TurboQuant KV zone, but the earlier draft assignment GGML_TYPE_TURBOQ3_NATIVE = 65 was never landed in ggml/include/ggml.h (doc-only reservation, zero code references). Slot 65 has since been assigned to GGML_TYPE_TURBOQ6_0 (2026-06-22, TODO 250). If turbo-tan's TQ3_0 is ported later it must take a fresh free slot, not 65 — and revisit whether it is a TurboQuant variant at all.

llama_ftype assignments

enum llama_ftype values for the MOSTLY_* variants are derived from ggml_type via:

LLAMA_FTYPE_MOSTLY_<NAME> = <next-available-slot>

Assignment order follows ggml_type numeric order, starting at the first unused mainline ftype slot (currently 41 after mainline's Q1_0=40).

llama_ftype assignments are mechanical; this document does not enumerate them. They are settled at the moment each ggml_type lands. The implementation that adds a new ggml_type MUST also add the corresponding LLAMA_FTYPE_MOSTLY_* in the same commit.

Reader compatibility for legacy fork GGUFs

We will NOT silently re-interpret legacy fork-specific GGUFs. If a user brings a GGUF quantized with (e.g.) buun's TURBO3_0=42, this fork's loader will fail with an explicit error:

unrecognized ggml_type 42 in <file.gguf>. This appears to be a buun-fork
GGUF; this fork places TurboQuant3 at type 61. Re-quantize with
`llama-quantize <model> turboq3_0`.

A future optional loader flag (--legacy-fork-ids=<fork-name>) MAY implement on-the-fly remapping. This is not part of the v1 contract.

Policy for adding new types

When this fork grows a new quant type (whether ported from a fork or invented):

Allocate the lowest-numbered available slot in the appropriate family zone (60–95). If the family zone is full, expand into 92–95 reserves before considering 96+ (which is owned by ik_llama compat).
The naming MUST follow the family's symbol prefix.
The ggml_type, llama_ftype, type-traits row, and CPU vecdot must land in the same commit. Partial landings are rejected.
Update this document in the same PR.
The PPL regression harness must include the new type before the PR can merge. No exceptions.

Open issues

TheTom's TURBO3_0 shipped existing GGUFs at slot 41. Production TheTom-quantized models exist with this ID. The v1 reader rejects them. A --legacy-fork-ids=thetom flag is a likely v2 addition.
Mainline may at some point claim slots 42–59 with types whose names collide with our renames. E.g., mainline could add a future GGML_TYPE_PLANAR3_0 unrelated to the (now-removed) RotorQuant family. Our policy is: rename ours, never theirs. The TURBOQ prefix is already there to absorb this. (The RQ_ prefix was used by RotorQuant, which was removed; those slots 72–75 are now reserved.)
GGUF metadata format (the part outside the type-ID enum) may also need fork-specific keys (e.g., turbo-tan's WHT rotation tables, buun's TCQ codebook indices). Out of scope for this document — to be addressed in a separate GGUF_METADATA_KEYS.md.

Version log

v1 (2026-05-12) — initial contract. Authored before any cherry-picks land. Authoritative for Phase 0+.
v2 (2026-05-22 to 2026-05-24) — Phase 5b-1a landed: IQ2_K=137, IQ3_K=138, IQ4_K=139 annotated with ygg canonical + Phase 5b-1a tag. Phase 5b-1b landed: IQ4_KS=144, IQ4_KSS=146, IQ4_KT=155, IQ3_KS=156 annotated with Phase 5b-1b tag + row_meta byte sizes. IQ4_KT separated from dormant IQ2/IQ3_KT in table entry. IQ5_K/IQ6_K noted as Phase 5b-2 recon in-flight.
v3 (2026-05-24) — Phase 5b-2 S1 landed: IQ5_K=140, IQ6_K=141 annotated with ygg canonical + Phase 5b-2 S1 tag. Phase 5b-1c S1 landed: IQ2_KL=157 annotated with ygg canonical + Phase 5b-1c S1 tag.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GGUF type-ID contract — v2

Why this exists

Partitioning policy

Fork extension zone (60–95) — canonical assignments

60–65: TurboQuant KV family (source: TheTom `feature/turboquant-kv-cache`)

66–71: TCQ + InnerQ KV family (source: buun `master` (TCQ) / TheTom (InnerQ))

72–79: Reserved (formerly RotorQuant KV family — removed)

80–85: WHT weight family (source: TheTom `feature/turboquant-kv-cache`)

86–91: RaBitQ weight family (source: turbo-tan `main`)

92–95: unanticipated weight quants

ik_llama compatibility zone (96–199) — preserved IDs

Row-interleaved / packed variants (200–255)

llama_ftype assignments

Reader compatibility for legacy fork GGUFs

Policy for adding new types

Open issues

Version log

FilesExpand file tree

TYPE_ASSIGNMENTS.md

Latest commit

History

TYPE_ASSIGNMENTS.md

File metadata and controls

GGUF type-ID contract — v2

Why this exists

Partitioning policy

Fork extension zone (60–95) — canonical assignments

60–65: TurboQuant KV family (source: TheTom feature/turboquant-kv-cache)

66–71: TCQ + InnerQ KV family (source: buun master (TCQ) / TheTom (InnerQ))

72–79: Reserved (formerly RotorQuant KV family — removed)

80–85: WHT weight family (source: TheTom feature/turboquant-kv-cache)

86–91: RaBitQ weight family (source: turbo-tan main)

92–95: unanticipated weight quants

ik_llama compatibility zone (96–199) — preserved IDs

Row-interleaved / packed variants (200–255)

llama_ftype assignments

Reader compatibility for legacy fork GGUFs

Policy for adding new types

Open issues

Version log

60–65: TurboQuant KV family (source: TheTom `feature/turboquant-kv-cache`)

66–71: TCQ + InnerQ KV family (source: buun `master` (TCQ) / TheTom (InnerQ))

80–85: WHT weight family (source: TheTom `feature/turboquant-kv-cache`)

86–91: RaBitQ weight family (source: turbo-tan `main`)