Commit d74dd9b
committed
llama-mmap: hint THP on mmap'd weights (Linux)
Issue madvise(MADV_HUGEPAGE) on the read-only file mapping used for
model weights on Linux. With a 1 GB model this drops the potential
page count from ~262K 4KB pages to ~512 2MB pages, reducing TLB
pressure and (more importantly) reducing the number of re-faults
when pages get evicted under memory pressure.
No-op on kernels where THP is disabled or set to 'never'. On
'madvise' mode (now the most common default for desktop distros),
this is an opt-in hint and requires the caller to ask. Guarded by
defined(MADV_HUGEPAGE) so it compiles cleanly on non-Linux.
Benchmark on a Skylake-SP VM (Bonsai-8B Q1_0, -fa on -ctk q8_0 -ctv
q8_0 -t 12 -ub 128): neutral on this machine (~9.5 t/s tg128 both
before and after) because the VM is not memory-constrained. The
change is intended for systems where the weights mapping gets
evicted and re-faulted under pressure, where the fault count
reduction from huge pages matters more than TLB pressure.1 parent e29cd48 commit d74dd9b
1 file changed
+14
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
451 | 451 | | |
452 | 452 | | |
453 | 453 | | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
454 | 468 | | |
455 | 469 | | |
456 | 470 | | |
| |||
0 commit comments