llama-mmap: hint THP on mmap'd weights (Linux) by Marxist-Leninist · Pull Request #19 · PrismML-Eng/llama.cpp

Marxist-Leninist · 2026-04-08T13:18:51Z

Issue madvise(MADV_HUGEPAGE) on the read-only file mapping used for model weights on Linux. For a 1 GB model this drops the potential page count from ~262K 4KB pages to ~512 2MB pages, reducing TLB pressure and (more importantly) reducing the number of re-faults when pages get evicted under memory pressure.

Linux-only, guarded by defined(MADV_HUGEPAGE) and __linux__. Skipped when numa is set. No-op where THP is disabled.

Bench on a Skylake-SP VM, Bonsai-8B Q1_0, -fa on -ctk q8_0 -ctv q8_0 -t 12 -ub 128: neutral (~9.5 t/s tg128 both before and after) because the VM isn't memory-constrained. The change is intended for systems where the mapping does get evicted under pressure (constrained laptops, containers).

Tried the same hint on ggml_aligned_malloc for KV/activation buffers as well — that showed a ~5% regression with no visible AnonHugePages, dropped that half.

Marxist-Leninist · 2026-04-08T13:29:52Z

Heads-up: the failing labeler check here is a pre-existing issue in the fork's workflow config, not related to this patch.

The PrismML-Eng labeler workflow (.github/workflows/labeler.yml) hard-codes repository: "ggml-org/llama.cpp" on its checkout step and reads the labeler config from there. Upstream ggml-org/llama.cpp/.github/labeler.yml still uses the v5 composition syntax:

server/webui:
    - changed-files:
        - all:
            - any-glob-to-any-file:
                - tools/server/webui/**

actions/labeler@v6 removed the all: / any: composition keys, so it errors with Unknown config options were under "changed-files": all on every PR — not just this one. The fix is either:

Bump actions/labeler back to v5 in the workflow, or
Flatten the server/webui entry in upstream's labeler.yml (drop the all: wrapper since it only contains one child):

server/webui:
    - changed-files:
        - any-glob-to-any-file:
            - tools/server/webui/**

Happy to open either fix as a separate PR if you'd like.

Issue madvise(MADV_HUGEPAGE) on the read-only file mapping used for model weights on Linux. For a 1 GB model this drops the potential page count from ~262K 4KB pages to ~512 2MB pages, reducing TLB pressure and (more importantly) reducing the number of re-faults when pages get evicted under memory pressure. No-op on kernels where THP is disabled. On 'madvise' mode (the common modern default for desktop distros), this is opt-in and requires the caller to ask. Guarded by defined(MADV_HUGEPAGE) so it compiles cleanly on non-Linux. Benchmark on a Skylake-SP VM, Bonsai-8B Q1_0, -fa on -ctk q8_0 -ctv q8_0 -t 12 -ub 128: neutral on this machine (~9.5 t/s tg128 both before and after) because the VM isn't memory-constrained. The change is intended for systems where the mapping does get evicted and re-faulted under pressure.

khosravipasha · 2026-04-13T23:52:17Z

Switched the PR to point to prism branch, just cleaned that branch and applied the pending cuda and x86 PRs.
Keeping master branch here exactly same as llama.cpp.

Copilot

Pull request overview

Adds a Linux-specific madvise(MADV_HUGEPAGE) hint for the read-only mmap used to map model weights, aiming to encourage THP backing (2MB pages) and reduce TLB pressure / re-fault overhead under memory pressure.

Changes:

On Linux, call madvise(..., MADV_HUGEPAGE) on the weights mapping (skipped when numa is enabled).
Emit a debug log if the hint cannot be applied.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+#ifdef __linux__
+        // Hint the kernel to back this region with 2MB huge pages where possible.
+        // For a 1 GB model weights map this can drop the number of pages from ~262K
+        // 4KB pages to ~512 2MB pages, reducing TLB pressure and (critically)
+        // reducing the number of re-faults when pages get evicted under memory
+        // pressure. No-op if THP is not enabled / supported.
+        if (!numa) {
+            if (madvise(addr, file->size(), MADV_HUGEPAGE)) {
+                LLAMA_LOG_DEBUG("note: madvise(.., MADV_HUGEPAGE) not applied: %s\n",
+                        strerror(errno));
+            }
+        }
+#endif


khosravipasha · 2026-04-13T23:51:14Z

+#ifdef __linux__
+        // Hint the kernel to back this region with 2MB huge pages where possible.
+        // For a 1 GB model weights map this can drop the number of pages from ~262K
+        // 4KB pages to ~512 2MB pages, reducing TLB pressure and (critically)
+        // reducing the number of re-faults when pages get evicted under memory
+        // pressure. No-op if THP is not enabled / supported.
+        if (!numa) {
+            if (madvise(addr, file->size(), MADV_HUGEPAGE)) {
+                LLAMA_LOG_DEBUG("note: madvise(.., MADV_HUGEPAGE) not applied: %s\n",
+                        strerror(errno));
+            }
+        }
+#endif
+


What is this part doing?
also the x86 related code is now in prism branch, so the only changes seems to be this?
Not too familiar with this

is there a speed difference if we do this?

khosravipasha · 2026-04-14T00:25:17Z

I think will close this, after changing the new branch, is this just doing a LLAMA_LOG_DEBUG?
I guess the cpu changes were already addressed in other PRs.

Marxist-Leninist force-pushed the feat/madv-hugepage branch from d74dd9b to a4ce593 Compare April 8, 2026 14:00

Marxist-Leninist force-pushed the feat/madv-hugepage branch from a4ce593 to 036a707 Compare April 8, 2026 14:01

khosravipasha changed the base branch from master to prism April 13, 2026 23:51

khosravipasha requested a review from Copilot April 13, 2026 23:52

Copilot started reviewing on behalf of khosravipasha April 13, 2026 23:52 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

khosravipasha reviewed Apr 14, 2026

View reviewed changes

khosravipasha closed this Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-mmap: hint THP on mmap'd weights (Linux)#19

llama-mmap: hint THP on mmap'd weights (Linux)#19
Marxist-Leninist wants to merge 1 commit intoPrismML-Eng:prismfrom
Marxist-Leninist:feat/madv-hugepage

Marxist-Leninist commented Apr 8, 2026 •

edited

Loading

Uh oh!

Marxist-Leninist commented Apr 8, 2026

Uh oh!

khosravipasha commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

khosravipasha Apr 13, 2026

Uh oh!

khosravipasha commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Marxist-Leninist commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Marxist-Leninist commented Apr 8, 2026

Uh oh!

khosravipasha commented Apr 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

khosravipasha Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

khosravipasha commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Marxist-Leninist commented Apr 8, 2026 •

edited

Loading