Every attempt to run inference with the Gemma 4 E2B .litertlm model crashes the app (on iOS) with a SIGSEGV. The crash is deterministic and 100% reproducible on my iOS device, but I am not having the same issues with an Android Emulator.
Environment:
- flutter_gemma: 0.13.2
- Model: Gemma 4 E2B (gemma-4-E2B-it.litertlm from litert-community HuggingFace)
- Device: iPhone 16 Pro Max
- OS: iPhone OS 26.2 (23C55) — beta
- Flutter: 3.41.6 - stable
The crash is in MediaPipe's native LlmLiteRTExecutor::PrefillInternal. A memset is called with a null destination pointer (x0 = 0x0, size x2 = 0x1000), indicating a tensor output buffer is null. It looks like the C++ code has no null guard before the memset but Im not entirely sure why the tensor output buffer would be null to begin with.
Crash stack (Thread 18 — crashed):
0 libsystem_platform.dylib _platform_memset + 108
1 Runner odml::infra::LlmLiteRTExecutor::PrefillInternal(tflite::impl::SignatureRunner*, absl::Span<int const>, bool) + 852
2 Runner odml::infra::LlmLiteRTExecutor::Prefill(litert::lm::ExecutorInputs const&, litert::lm::ExecutorPrefillParams const&) + 1956
3 Runner odml::infra::LockedLlmExecutor::Prefill(...) + 304
4 Runner odml::infra::(anonymous namespace)::LlmExecutorCalculator::Process(mediapipe::CalculatorContext*) + 1884
5 Runner mediapipe::CalculatorNode::ProcessNode(...)
6 Runner mediapipe::internal::SchedulerQueue::RunCalculatorNode(...)
7 Runner mediapipe::internal::SchedulerQueue::RunNextTask()
8 Runner mediapipe::ThreadPool::RunWorker() + 128
9 Runner mediapipe::ThreadPool::WorkerThread::ThreadBody(void*)
ARM Thread State at crash:
x0: 0x0000000000000000 ← null destination for memset
x1: 0x0000000000000000 ← fill value (0)
x2: 0x0000000000001000 ← size (4096 bytes)
esr: 0x92000046 (Data Abort) byte write Translation fault
Dart/Swift call chain that triggered it (Thread 11):
InferenceSession.generateResponse(prompt:) (InferenceModel.swift:144)
closure #1 in PlatformServiceImpl.generateResponse(completion:) (FlutterGemmaPlugin.swift:239)
Flutter usage:
await FlutterGemma.installModel(
modelType: ModelType.gemmaIt,
fileType: ModelFileType.litertlm,
).fromNetwork(
'https://huggingface.co/litert-community/gemma-4-E2B-it-litert-lm/resolve/main/gemma-4-E2B-it.litertlm',
).install();
final model = await FlutterGemma.getActiveModel(maxTokens: 4096);
final session = await model.createSession(temperature: 0.7, topK: 40);
await session.addQueryChunk(Message.text(text: prompt, isUser: true));
final response = await session.getResponse();
What I've tried:
- Confirmed the model file downloads successfully (no 401, file is valid)
- Increased maxTokens from 1024 to 4096 -- crash persists
- Reproduced across two separate builds
Expected behavior: Inference completes and returns a response.
Actual behavior: App crashes with SIGSEGV. LlmLiteRTExecutor::PrefillInternal calls memset on a null tensor buffer with no null check.
Notes:
- The .litertlm format uses LlmLiteRTExecutor. I have not tested a .task format model on this device. It's possible this is specific to the LiteRT executor path
- The crash is in native MediaPipe C++ code, and I dont believe it can be caught by Flutter/Dart error handling but that honestly wouldn't help me much as the app revolves around Gemma usage :)
Every attempt to run inference with the Gemma 4 E2B .litertlm model crashes the app (on iOS) with a SIGSEGV. The crash is deterministic and 100% reproducible on my iOS device, but I am not having the same issues with an Android Emulator.
Environment:
The crash is in MediaPipe's native
LlmLiteRTExecutor::PrefillInternal. Amemsetis called with a null destination pointer (x0 = 0x0, size x2 = 0x1000), indicating a tensor output buffer is null. It looks like the C++ code has no null guard before the memset but Im not entirely sure why the tensor output buffer would be null to begin with.ARM Thread State at crash:
Dart/Swift call chain that triggered it (Thread 11):
Flutter usage:
What I've tried:
Expected behavior: Inference completes and returns a response.
Actual behavior: App crashes with
SIGSEGV. LlmLiteRTExecutor::PrefillInternalcallsmemseton a null tensor buffer with no null check.Notes: