Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions src/llama-mmap.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -451,6 +451,20 @@ struct llama_mmap::impl {
throw std::runtime_error(format("mmap failed: %s", strerror(errno)));
}

#ifdef __linux__
// Hint the kernel to back this region with 2MB huge pages where possible.
// For a 1 GB model weights map this can drop the number of pages from ~262K
// 4KB pages to ~512 2MB pages, reducing TLB pressure and (critically)
// reducing the number of re-faults when pages get evicted under memory
// pressure. No-op if THP is not enabled / supported.
if (!numa) {
if (madvise(addr, file->size(), MADV_HUGEPAGE)) {
LLAMA_LOG_DEBUG("note: madvise(.., MADV_HUGEPAGE) not applied: %s\n",
strerror(errno));
}
}
#endif
Comment on lines +454 to +466

Comment on lines +454 to +467
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this part doing?
also the x86 related code is now in prism branch, so the only changes seems to be this?
Not too familiar with this

is there a speed difference if we do this?

if (prefetch > 0) {
if (posix_madvise(addr, std::min(file->size(), prefetch), POSIX_MADV_WILLNEED)) {
LLAMA_LOG_WARN("warning: posix_madvise(.., POSIX_MADV_WILLNEED) failed: %s\n",
Expand Down
Loading