Skip to content

fix vllm benchmark multi processing problem#127

Merged
yghstill merged 1 commit into
Tencent:mainfrom
RuBing-Yang:spec_decode
Nov 4, 2025
Merged

fix vllm benchmark multi processing problem#127
yghstill merged 1 commit into
Tencent:mainfrom
RuBing-Yang:spec_decode

Conversation

@RuBing-Yang
Copy link
Copy Markdown
Collaborator

This pull request removes Ray-based distributed processing from the vLLM benchmarking code and replaces it with Python's built-in multiprocessing module. The update affects both the Eagle and baseline answer generation workflows, improving compatibility and simplifying the codebase. The multiprocessing approach now handles parallel execution across multiple GPUs, and file-writing is made safe for concurrent processes. Additional minor improvements include better error handling and device assignment.

Key changes by theme:

Migration from Ray to Multiprocessing:

  • All Ray dependencies and logic have been removed from benchmark_engine.py, generate_baseline_answer.py, and generate_eagle_answer.py. Instead, Python's multiprocessing is used for multi-GPU parallelism, including process spawning, locking, and shared result lists. [1] [2] [3]
  • The benchmark runner now splits work across processes, assigns GPUs using CUDA_VISIBLE_DEVICES, and synchronizes output file writes with a multiprocessing lock. [1] [2] [3] [4] [5] [6] [7]

File Handling and Concurrency:

  • Output directories for answer files are created if they do not exist, ensuring safe file output in both single and multi-process scenarios. [1] [2]
  • File writes are protected by a lock during multiprocessing to avoid race conditions, and results are aggregated via a shared list where needed. [1] [2]

Error Handling and Logging:

  • Improved error handling in _reorg_answer_file to catch and log invalid JSON lines instead of crashing.
  • Minor logging improvements, such as correcting the environment variable name in device assignment logs.

API and Function Signature Updates:

  • Added optional lock, results_list, and device_list parameters to answer generation functions to support multiprocessing and GPU assignment. [1] [2]

Code Simplification:

  • Standalone execution paths are now always single-process and simplified, as Ray-based distributed execution is no longer supported. [1] [2] [3] [4]

These changes collectively modernize the benchmarking workflow to use standard Python multiprocessing, making the codebase easier to maintain and run in diverse environments.

@yghstill yghstill merged commit b125081 into Tencent:main Nov 4, 2025
5 checks passed
dawnranger pushed a commit to dawnranger/AngelSlim that referenced this pull request Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants