Skip to content

feat: implement batch tokenization for TokenizeManager#117

Open
Alise-svg wants to merge 1 commit into
sgl-project:mainfrom
Alise-svg:feature/batch-tokenization
Open

feat: implement batch tokenization for TokenizeManager#117
Alise-svg wants to merge 1 commit into
sgl-project:mainfrom
Alise-svg:feature/batch-tokenization

Conversation

@Alise-svg
Copy link
Copy Markdown

  • Use tokenizer() for batch encoding plain texts
  • Use apply_chat_template() for batch processing chat templates
  • Remove padding tokens using attention mask
  • Preserve original message order
  • Add comprehensive unit tests for batch tokenization
2

@Alise-svg
Copy link
Copy Markdown
Author

Benchmark Description

Objective

Compare the performance of batch tokenization versus individual tokenization.

Methodology

  • Generate synthetic random texts and conversations
  • Test with different batch sizes (1, 5, 10, 20, 50, 100, 200)
  • Run each test 10 times and take the average
  • Test both plain text and chat template scenarios

Results
屏幕截图 2026-04-19 161904
Conclusion

  • Small batch (≤20): Batch processing has overhead, slightly slower than individual processing
  • Large batch (≥50): Batch processing shows clear advantage
  • At batch=200: Chat template achieves 1.71x speedup, Plain text achieves 1.85x speedup

Batch tokenization significantly improves throughput in high-concurrency scenarios.

@Alise-svg Alise-svg force-pushed the feature/batch-tokenization branch from a4fa5b6 to add3d38 Compare April 20, 2026 03:11
@DarkSharpness DarkSharpness added enhancement New feature or request duplicate This issue or pull request already exists labels May 10, 2026
@DarkSharpness
Copy link
Copy Markdown
Collaborator

Thanks for your contribution @Alise-svg . Might be a duplicate of #55. Could you take a look at that PR and compare with that?

  - Use tokenizer() for batch encoding plain texts
  - Use apply_chat_template() for batch processing chat templates
  - Remove padding tokens using attention mask
  - Preserve original message order
  - Add comprehensive unit tests for batch tokenization
@Alise-svg Alise-svg force-pushed the feature/batch-tokenization branch from add3d38 to 48047df Compare May 12, 2026 14:47
@Alise-svg
Copy link
Copy Markdown
Author

@DarkSharpness
Thanks for pointing out #55! I've reviewed it and there are key differences:
Batch apply_chat_template: My implementation uses the batch API of apply_chat_template(chat_convs, ...) which processes all conversations at once, while #55 calls it individually in a loop. This gives better performance for chat-heavy workloads with large batches.
Separate processing paths: Plain text and chat templates go through different HuggingFace APIs. Separating them allows each path to use its optimal batch API.
That said, #55's padding=False approach is cleaner. I can simplify my padding logic by adopting that pattern. Would you prefer me to revise based on #55's approach but add batch apply_chat_template support?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

duplicate This issue or pull request already exists enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants