Skip to content

Commit c90f007

Browse files
unamedkrclaude
andcommitted
A2: Multi-shard safetensors + Gemma 4B support
- Multi-shard loading: detect index.json, mmap each shard separately - Tensor name: language_model.model.* prefix for Gemma 4B - Per-tensor data_base pointer for shard-aware weight access - Auto-detect gemma-3-4b-it in HF cache (tq_convert) - All existing tests pass, Qwen3.5 + Gemma 270M verified Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 781cca0 commit c90f007

3 files changed

Lines changed: 387 additions & 82 deletions

File tree

include/turboquant/tq_engine.h

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -157,9 +157,13 @@ typedef struct {
157157
void* _q4_data; /* heap buffer for all Q4 quantized weights */
158158
size_t _q4_size;
159159

160-
/* Memory management */
161-
void* _mmap_data;
160+
/* Memory management — supports multi-shard safetensors */
161+
#define TQ_MAX_SHARDS 16
162+
void* _mmap_data; /* primary mmap (shard 0 or TQM file) */
162163
size_t _mmap_size;
164+
void* _mmap_shards[TQ_MAX_SHARDS]; /* additional shard mmaps (index 0 unused) */
165+
size_t _mmap_shard_sizes[TQ_MAX_SHARDS];
166+
int _n_shards; /* total number of shards (0 or 1 = single file) */
163167
void* _converted_data; /* heap buffer for dtype-converted tensors (e.g., BF16->FP32) */
164168
size_t _converted_size;
165169
} tq_model_t;

0 commit comments

Comments
 (0)