fix vllm benchmark multi processing problem by RuBing-Yang · Pull Request #127 · Tencent/AngelSlim

RuBing-Yang · 2025-11-04T10:41:59Z

This pull request removes Ray-based distributed processing from the vLLM benchmarking code and replaces it with Python's built-in multiprocessing module. The update affects both the Eagle and baseline answer generation workflows, improving compatibility and simplifying the codebase. The multiprocessing approach now handles parallel execution across multiple GPUs, and file-writing is made safe for concurrent processes. Additional minor improvements include better error handling and device assignment.

Key changes by theme:

Migration from Ray to Multiprocessing:

All Ray dependencies and logic have been removed from benchmark_engine.py, generate_baseline_answer.py, and generate_eagle_answer.py. Instead, Python's multiprocessing is used for multi-GPU parallelism, including process spawning, locking, and shared result lists. [1] [2] [3]
The benchmark runner now splits work across processes, assigns GPUs using CUDA_VISIBLE_DEVICES, and synchronizes output file writes with a multiprocessing lock. [1] [2] [3] [4] [5] [6] [7]

File Handling and Concurrency:

Output directories for answer files are created if they do not exist, ensuring safe file output in both single and multi-process scenarios. [1] [2]
File writes are protected by a lock during multiprocessing to avoid race conditions, and results are aggregated via a shared list where needed. [1] [2]

Error Handling and Logging:

Improved error handling in _reorg_answer_file to catch and log invalid JSON lines instead of crashing.
Minor logging improvements, such as correcting the environment variable name in device assignment logs.

API and Function Signature Updates:

Added optional lock, results_list, and device_list parameters to answer generation functions to support multiprocessing and GPU assignment. [1] [2]

Code Simplification:

Standalone execution paths are now always single-process and simplified, as Ray-based distributed execution is no longer supported. [1] [2] [3] [4]

These changes collectively modernize the benchmarking workflow to use standard Python multiprocessing, making the codebase easier to maintain and run in diverse environments.

fix vllm benchmark multi processing problem

9c886cf

RuBing-Yang requested review from liusong1222 and yghstill November 4, 2025 10:43

yghstill approved these changes Nov 4, 2025

View reviewed changes

liusong1222 approved these changes Nov 4, 2025

View reviewed changes

yghstill merged commit b125081 into Tencent:main Nov 4, 2025
5 checks passed

dawnranger pushed a commit to dawnranger/AngelSlim that referenced this pull request Mar 11, 2026

fix vllm benchmark multi processing problem (Tencent#127)

b10d048

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix vllm benchmark multi processing problem#127

fix vllm benchmark multi processing problem#127
yghstill merged 1 commit into
Tencent:mainfrom
RuBing-Yang:spec_decode

RuBing-Yang commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

RuBing-Yang commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants