Skip to content

Implement Sequence ID pooling in BatchedExecutor to prevent native SeqMax overflow and crashes#1386

Merged
martindevans merged 1 commit into
SciSharp:masterfrom
zsogitbe:SequenceIDPooling
May 15, 2026
Merged

Implement Sequence ID pooling in BatchedExecutor to prevent native SeqMax overflow and crashes#1386
martindevans merged 1 commit into
SciSharp:masterfrom
zsogitbe:SequenceIDPooling

Conversation

@zsogitbe

Copy link
Copy Markdown
Contributor

Description

The Problem
Under sustained traffic or continuous batching scenarios, the BatchedExecutor eventually causes the native llama.cpp backend to deadlock, reject tokens, or throw a segmentation fault.

The Root Cause
Currently, BatchedExecutor assigns sequence IDs using a strictly incrementing counter (_nextSequenceId++). When a Conversation is disposed, its tokens are properly removed from the KV cache via MemorySequenceRemove, but its Sequence ID is permanently abandoned.

In llama.cpp, the SeqMax (n_seq_max) parameter dictates the static allocation of arrays within the native KV cache. It expects sequence IDs to act as strictly bounded array indices (from 0 to SeqMax - 1). Once the C# _nextSequenceId counter exceeds SeqMax, passing that ID to the native backend results in an out-of-bounds memory access.

The Solution
This PR replaces the strictly incrementing counter with a Sequence ID pool.

  1. BatchedExecutor: Updated to track active IDs (e.g., using a thread-safe HashSet or similar structure) and assign the lowest available sequence ID. Added a ReleaseSequenceId method.
  2. Conversation: Updated the Dispose() method to call Executor.ReleaseSequenceId(ConversationId) after clearing the KV cache.

Benefits

  • Native Memory Safety: Guarantees that sequence IDs will never exceed the configured SeqMax limit as long as concurrent conversations stay within bounds.
  • Indefinite Uptime: Allows the BatchedExecutor to run continuously under high-traffic workloads without requiring the host application to dangerously destroy and recreate the massive native context to reset the ID counter.

Comment thread LLama/Batched/BatchedExecutor.cs

@martindevans martindevans left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for fixing this.

@zsogitbe

Copy link
Copy Markdown
Contributor Author

Martin, when you merge this could you please also check the "llama.cpp @ 73c9eb8" link in the master branch. I think you forgot to update it to the version applied with the former main update. Thanks a lot!

@martindevans

Copy link
Copy Markdown
Member

could you please also check the [submodule]

Oops, I keep forgetting to do that. I fixed it in #1387

@martindevans martindevans merged commit ecd1849 into SciSharp:master May 15, 2026
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants