feat: implement batch tokenization for TokenizeManager by Alise-svg · Pull Request #117 · sgl-project/mini-sglang

Alise-svg · 2026-04-19T06:40:13Z

Use tokenizer() for batch encoding plain texts
Use apply_chat_template() for batch processing chat templates
Remove padding tokens using attention mask
Preserve original message order
Add comprehensive unit tests for batch tokenization

Alise-svg · 2026-04-19T08:21:24Z

Benchmark Description

Objective

Compare the performance of batch tokenization versus individual tokenization.

Methodology

Generate synthetic random texts and conversations
Test with different batch sizes (1, 5, 10, 20, 50, 100, 200)
Run each test 10 times and take the average
Test both plain text and chat template scenarios

Results

Conclusion

Small batch (≤20): Batch processing has overhead, slightly slower than individual processing
Large batch (≥50): Batch processing shows clear advantage
At batch=200: Chat template achieves 1.71x speedup, Plain text achieves 1.85x speedup

Batch tokenization significantly improves throughput in high-concurrency scenarios.

DarkSharpness · 2026-05-10T10:19:10Z

Thanks for your contribution @Alise-svg . Might be a duplicate of #55. Could you take a look at that PR and compare with that?

- Use tokenizer() for batch encoding plain texts - Use apply_chat_template() for batch processing chat templates - Remove padding tokens using attention mask - Preserve original message order - Add comprehensive unit tests for batch tokenization

Alise-svg · 2026-05-16T14:13:55Z

@DarkSharpness
Thanks for pointing out #55! I've reviewed it and there are key differences:
Batch apply_chat_template: My implementation uses the batch API of apply_chat_template(chat_convs, ...) which processes all conversations at once, while #55 calls it individually in a loop. This gives better performance for chat-heavy workloads with large batches.
Separate processing paths: Plain text and chat templates go through different HuggingFace APIs. Separating them allows each path to use its optimal batch API.
That said, #55's padding=False approach is cleaner. I can simplify my padding logic by adopting that pattern. Would you prefer me to revise based on #55's approach but add batch apply_chat_template support?

Alise-svg force-pushed the feature/batch-tokenization branch from a4fa5b6 to add3d38 Compare April 20, 2026 03:11

DarkSharpness added enhancement New feature or request duplicate This issue or pull request already exists labels May 10, 2026

Alise-svg force-pushed the feature/batch-tokenization branch from add3d38 to 48047df Compare May 12, 2026 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement batch tokenization for TokenizeManager#117

feat: implement batch tokenization for TokenizeManager#117
Alise-svg wants to merge 1 commit into
sgl-project:mainfrom
Alise-svg:feature/batch-tokenization

Alise-svg commented Apr 19, 2026

Uh oh!

Alise-svg commented Apr 19, 2026

Uh oh!

DarkSharpness commented May 10, 2026

Uh oh!

Alise-svg commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Alise-svg commented Apr 19, 2026

Uh oh!

Alise-svg commented Apr 19, 2026

Uh oh!

DarkSharpness commented May 10, 2026

Uh oh!

Alise-svg commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants