Skip to content

fix: prevent apps from crashing when LLMs are loaded#1063

Open
chmjkb wants to merge 4 commits intomainfrom
@chmjkb/mmap-load-mode
Open

fix: prevent apps from crashing when LLMs are loaded#1063
chmjkb wants to merge 4 commits intomainfrom
@chmjkb/mmap-load-mode

Conversation

@chmjkb
Copy link
Copy Markdown
Collaborator

@chmjkb chmjkb commented Apr 8, 2026

Description

Fix crashes when loading LLMs by using mmap and avoiding reporting model file size as external memory pressure. This PR changes how LLM models are loaded to prevent crashes with large models. Previously, reporting the full model file size via setExternalMemoryPressure() would cause Hermes to crash because it breaks the GC's heap accounting when external memory exceeds or approaches the 3GB max heap size.

We also set the LoadMode to Mmap instead of File, causing the ET runtime to lazy-load weights to RAM on-demand instead of storing the entire file content in memory, preventing the OS from killing the app.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

  • Take a large model, verify it crashes the app on main and note the memory consumption
  • Try running the same model on this branch, make sure it doesn't crash and note the memory consumption
  • Verify models that would usually fit in your RAM are not slowed down significantly.

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@chmjkb chmjkb marked this pull request as ready for review April 8, 2026 10:02
@chmjkb chmjkb linked an issue Apr 8, 2026 that may be closed by this pull request
@msluszniak msluszniak added the bug fix PRs that are fixing bugs label Apr 8, 2026
…m/LLM.cpp

Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@NorbertKlockiewicz NorbertKlockiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to crash Private Mind during model load, only during generation but on emulator with 2gbs of RAM, I am only wondering if we shouldn't also load vision encoders in similar way as those also can use a lot of RAM and when they are loaded with text_decoder the app can crash.

@chmjkb
Copy link
Copy Markdown
Collaborator Author

chmjkb commented Apr 8, 2026

I wasn't able to crash Private Mind during model load, only during generation but on emulator with 2gbs of RAM, I am only wondering if we shouldn't also load vision encoders in similar way as those also can use a lot of RAM and when they are loaded with text_decoder the app can crash.

yeah, we do. I'll add the changes tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix PRs that are fixing bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prevent OOM-based app crashes

4 participants