Implement Sequence ID pooling in BatchedExecutor to prevent native SeqMax overflow and crashes by zsogitbe · Pull Request #1386 · SciSharp/LLamaSharp

zsogitbe · 2026-05-11T04:57:45Z

Description

The Problem
Under sustained traffic or continuous batching scenarios, the BatchedExecutor eventually causes the native llama.cpp backend to deadlock, reject tokens, or throw a segmentation fault.

The Root Cause
Currently, BatchedExecutor assigns sequence IDs using a strictly incrementing counter (_nextSequenceId++). When a Conversation is disposed, its tokens are properly removed from the KV cache via MemorySequenceRemove, but its Sequence ID is permanently abandoned.

In llama.cpp, the SeqMax (n_seq_max) parameter dictates the static allocation of arrays within the native KV cache. It expects sequence IDs to act as strictly bounded array indices (from 0 to SeqMax - 1). Once the C# _nextSequenceId counter exceeds SeqMax, passing that ID to the native backend results in an out-of-bounds memory access.

The Solution
This PR replaces the strictly incrementing counter with a Sequence ID pool.

BatchedExecutor: Updated to track active IDs (e.g., using a thread-safe HashSet or similar structure) and assign the lowest available sequence ID. Added a ReleaseSequenceId method.
Conversation: Updated the Dispose() method to call Executor.ReleaseSequenceId(ConversationId) after clearing the KV cache.

Benefits

Native Memory Safety: Guarantees that sequence IDs will never exceed the configured SeqMax limit as long as concurrent conversations stay within bounds.
Indefinite Uptime: Allows the BatchedExecutor to run continuously under high-traffic workloads without requiring the host application to dangerously destroy and recreate the massive native context to reset the ID counter.

…qMax overflow and crashes

martindevans

Looks good, thanks for fixing this.

zsogitbe · 2026-05-15T07:32:14Z

Martin, when you merge this could you please also check the "llama.cpp @ 73c9eb8" link in the master branch. I think you forgot to update it to the version applied with the former main update. Thanks a lot!

martindevans · 2026-05-15T12:10:03Z

could you please also check the [submodule]

Oops, I keep forgetting to do that. I fixed it in #1387

Implement Sequence ID pooling in BatchedExecutor to prevent native Se…

702adf1

…qMax overflow and crashes

martindevans reviewed May 14, 2026

View reviewed changes

Comment thread LLama/Batched/BatchedExecutor.cs

martindevans approved these changes May 14, 2026

View reviewed changes

martindevans merged commit ecd1849 into SciSharp:master May 15, 2026
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Sequence ID pooling in BatchedExecutor to prevent native SeqMax overflow and crashes#1386

Implement Sequence ID pooling in BatchedExecutor to prevent native SeqMax overflow and crashes#1386
martindevans merged 1 commit into
SciSharp:masterfrom
zsogitbe:SequenceIDPooling

zsogitbe commented May 11, 2026

Uh oh!

Uh oh!

martindevans left a comment

Uh oh!

zsogitbe commented May 15, 2026

Uh oh!

martindevans commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zsogitbe commented May 11, 2026

Description

Uh oh!

Uh oh!

martindevans left a comment

Choose a reason for hiding this comment

Uh oh!

zsogitbe commented May 15, 2026

Uh oh!

martindevans commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants