Commit a79521b
Add LongRoPE support and fp64 RoPE precompute for Phi-3 / Phi-4 family
Summary:
Adds LongRoPE plumbing and an fp64 cos/sin precompute pass to
hf_precompute_freqs_cis. Together these eliminate Phi-4 Mini decode-time
n-gram repetition under both XNNPACK and Vulkan delegates.
Phi-3 and Phi-4 family models use HF's "longrope" RoPE scaling, which
multiplies cos/sin by an attention_factor (~1.19 for Phi-4 Mini) and
divides inv_freq element-wise by a per-dimension short_factor (when
seq_len <= original_max_position_embeddings) or long_factor. ET's
hf_precompute_freqs_cis was vanilla RoPE -- missing both terms.
At typical export configurations the dominant effect is the missing
attention_factor, which leaves attention scores ~1.42x softer than
the model was trained for. Compounded across 32 layers this pushes
Phi-4 Mini's narrow top-2 logit margins past their tipping point and
triggers greedy-decode n-gram repetition; the same error explains
prior on-device looping observed under both XNNPACK and Vulkan.
Adds LongRoPE plumbing through ModelArgs (short_factor, long_factor,
original_max_position_embeddings, max_position_embeddings,
rope_scaling_attention_factor) and into hf_precompute_freqs_cis,
with attention_factor derived as
sqrt(1 + log(scaling)/log(original_max)) when not explicitly set.
The non-HF precompute_freqs_cis path is left vanilla; longrope models
must set use_hf_rope=True (noted in Rope.__init__).
Also moves the cos/sin precompute to fp64, casting to fp32 once at
the end. After LongRoPE corrects the 19% scale error, fp32 ULP-level
rounding in the cos/sin tables becomes the next-largest contributor
to logit drift -- load-bearing on Vulkan under sampling: with fp32
precompute, 1/2 T=0.5 trajectories collapsed into a 4-gram loop
("avoiding data and data biases") even with LongRoPE applied. fp64
precompute is one-time at construction (microseconds on a few-KB
table); runtime tables remain fp32, so no inference-time cost.
Wires the LongRoPE fields into examples/models/phi_4_mini/config/config.json
sourced from HF's Phi-4 Mini config.
Test Plan:
Validated end-to-end on Samsung Galaxy S25:
- Eager bf16 (host, 12 threads): 3/3 loop-free at T=0 greedy and
T=0.5 sampling x 2 seeds.
- XNNPACK 8da4w-g32 on device: 3/3 loop-free, ~20.7 tok/s decode.
- Vulkan 8da4w-g32 on device: 3/3 loop-free, ~17.1 tok/s decode.
Reproduced across two distinct S25 units to confirm result is not
device-specific. Verified that omitting either fix regresses Vulkan
sampling: LongRoPE alone leaves residual sampling loops; fp64 alone
was previously known insufficient.
Co-Authored-By: Claude <noreply@anthropic.com>1 parent 42b25bb commit a79521b
3 files changed
Lines changed: 88 additions & 12 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
123 | 123 | | |
124 | 124 | | |
125 | 125 | | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
126 | 134 | | |
127 | 135 | | |
128 | 136 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
136 | 136 | | |
137 | 137 | | |
138 | 138 | | |
139 | | - | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
140 | 142 | | |
141 | 143 | | |
142 | 144 | | |
143 | 145 | | |
144 | 146 | | |
145 | 147 | | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
146 | 153 | | |
147 | 154 | | |
148 | 155 | | |
149 | 156 | | |
150 | | - | |
151 | | - | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
152 | 162 | | |
153 | | - | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
154 | 167 | | |
155 | | - | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
156 | 201 | | |
157 | 202 | | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
| 203 | + | |
| 204 | + | |
162 | 205 | | |
163 | | - | |
164 | | - | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
165 | 213 | | |
166 | 214 | | |
167 | 215 | | |
| |||
241 | 289 | | |
242 | 290 | | |
243 | 291 | | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
244 | 303 | | |
245 | 304 | | |
246 | 305 | | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
247 | 311 | | |
248 | 312 | | |
249 | 313 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
15 | 19 | | |
0 commit comments