feat: add compare_with_vllm.py example-03 by viraatc · Pull Request #38 · mlcommons/endpoints

viraatc · 2025-12-03T23:33:45Z

What does this PR do?

example script to run vllm and inference-endpoints on a given endpoint-url and compare metrics:

python examples/03_BenchmarkComparison/compare_with_vllm.py --model Qwen/Qwen2.5-0.5B-Instruct --endpoint <endpoint url>:<endpoint port>

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

github-actions · 2025-12-03T23:33:55Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2025-12-03T23:34:01Z

Summary of Changes

Hello @viraatc, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds a new example to facilitate direct performance comparisons between the inference-endpoint tool and vLLM for large language models. It provides a robust Python script and accompanying documentation that enables users to benchmark and evaluate the efficiency of different inference serving solutions under consistent conditions, offering insights into their respective throughput and latency characteristics.

Highlights

New Benchmarking Example: Introduced a new example script, compare_with_vllm.py, designed to compare the performance metrics of inference-endpoint against vLLM's benchmarking tool.
Comprehensive Documentation: Added a detailed README.md for the new 03_BenchmarkComparison example, outlining prerequisites, usage instructions, available command-line options, and an example of the comparison output.
Automated Metric Comparison: The Python script automates the execution of both benchmarking tools, parses their respective outputs using regular expressions, and presents a clear, tabular comparison of key performance indicators like throughput, TTFT (Time To First Token), and TPOT (Time Per Output Token).
Server Warmup Mechanism: Included a server warmup function to ensure the inference server is fully operational and responsive before commencing the actual benchmarks, improving the reliability of performance measurements.
Standardized Datasets: Provided two identical JSONL datasets (dummy_prompts_ie.jsonl and dummy_prompts_vllm.jsonl), each formatted specifically for inference-endpoint and vLLM respectively, to ensure a fair comparison using the same set of prompts.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull request overview

This PR adds a benchmark comparison example that allows users to compare performance metrics between inference-endpoint and vLLM's benchmarking tools using identical prompts.

Key Changes:

Adds a new example script that benchmarks both vLLM and inference-endpoint on the same dataset
Includes pre-generated JSONL datasets in formats compatible with each tool
Provides comprehensive documentation with usage examples

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
examples/03_BenchmarkComparison/compare_with_vllm.py	Main comparison script that runs both benchmarks and displays comparative metrics
examples/03_BenchmarkComparison/dummy_prompts_vllm.jsonl	Dataset file with 96 prompts in vLLM format (`{"prompt": "..."}`)
examples/03_BenchmarkComparison/dummy_prompts_ie.jsonl	Dataset file with 96 prompts in inference-endpoint format (`{"text_input": "..."}`)
examples/03_BenchmarkComparison/README.md	Documentation explaining prerequisites, usage, and expected output

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gemini-code-assist

Code Review

This pull request introduces a new example script for comparing the performance of inference-endpoint with vllm. The script is well-structured and includes helpful features like a dry-run mode and server warmup. The accompanying README provides clear instructions. I've made a few suggestions to improve code compatibility, robustness, and maintainability. Additionally, I noticed a duplicate prompt in the dummy datasets.

arekay-nv

Thanks!

arekay-nv · 2025-12-04T17:42:01Z

@anandhu-eng can you try out this example to ensure that we aren't missing anything.

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

examples/03_BenchmarkComparison/compare_with_vllm.py:1

This appears to be duplicated logic from dataloader.py line 328. The walrus operator pattern if line := line.strip(): is used in dataloader.py but here on line 328 there's just line.strip() without assignment, which suggests this may be unintended or the diff display is incorrect. If this is actually in compare_with_vllm.py, this line has no effect.

#!/usr/bin/env python3

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

viraatc · 2025-12-05T00:55:39Z

rebased onto #32

Copilot

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

arekay-nv

LGTM!

Copilot AI review requested due to automatic review settings December 3, 2025 23:33

viraatc requested a review from a team as a code owner December 3, 2025 23:33

github-actions Bot requested review from arekay-nv and nvzhihanj December 3, 2025 23:33

Copilot AI reviewed Dec 3, 2025

View reviewed changes

gemini-code-assist Bot reviewed Dec 3, 2025

View reviewed changes

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py

Comment thread examples/03_BenchmarkComparison/dummy_prompts_ie.jsonl Outdated

arekay-nv approved these changes Dec 4, 2025

View reviewed changes

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py Outdated

Comment thread examples/03_BenchmarkComparison/README.md

Comment thread examples/03_BenchmarkComparison/README.md Outdated

arekay-nv requested a review from anandhu-eng December 4, 2025 17:41

Copilot AI review requested due to automatic review settings December 5, 2025 00:54

viraatc force-pushed the feature/viraatc-vllm-compare-example branch from a44df83 to 895b918 Compare December 5, 2025 00:54

Copilot AI reviewed Dec 5, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/configs.py

github-code-quality Bot found potential problems Dec 5, 2025

View reviewed changes

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py Fixed

Copilot AI review requested due to automatic review settings December 5, 2025 01:15

viraatc force-pushed the feature/viraatc-vllm-compare-example branch from 4476f1c to 34ff0c8 Compare December 5, 2025 01:15

Copilot AI reviewed Dec 5, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/configs.py

Comment thread src/inference_endpoint/commands/probe.py Outdated

Comment thread src/inference_endpoint/dataset_manager/dataloader.py

viraatc force-pushed the feature/viraatc-vllm-compare-example branch 3 times, most recently from a729cad to 1e23313 Compare December 5, 2025 01:21

Copilot AI review requested due to automatic review settings December 5, 2025 01:24

viraatc force-pushed the feature/viraatc-vllm-compare-example branch from 1e23313 to 8f7d5d4 Compare December 5, 2025 01:24

Copilot AI reviewed Dec 5, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/configs.py

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py

viraatc force-pushed the feature/viraatc-vllm-compare-example branch from 8f7d5d4 to 246bca2 Compare December 5, 2025 01:43

anandhu-eng reviewed Dec 8, 2025

View reviewed changes

Comment thread examples/03_BenchmarkComparison/README.md

anandhu-eng reviewed Dec 8, 2025

View reviewed changes

Comment thread requirements/test.txt Outdated

Copilot AI review requested due to automatic review settings December 8, 2025 17:00

Copilot AI reviewed Dec 8, 2025

View reviewed changes

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py

nv-alicheng approved these changes Dec 11, 2025

View reviewed changes

Copilot AI review requested due to automatic review settings December 15, 2025 19:21

viraatc force-pushed the feature/viraatc-vllm-compare-example branch from 1d5a568 to cd7c959 Compare December 15, 2025 19:21

Copilot AI reviewed Dec 15, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/configs.py

Comment thread src/inference_endpoint/commands/probe.py

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py

viraatc force-pushed the feature/viraatc-vllm-compare-example branch from bc5420a to e3dce1b Compare December 15, 2025 19:29

feat: add compare_with_vllm.py

c3eec49

viraatc force-pushed the feature/viraatc-vllm-compare-example branch from e3dce1b to c3eec49 Compare December 15, 2025 19:35

fix --verbose

052c95e

Copilot AI review requested due to automatic review settings December 15, 2025 19:45

Copilot AI reviewed Dec 15, 2025

View reviewed changes

viraatc commented Dec 15, 2025

View reviewed changes

Comment thread requirements/base.txt Outdated

viraatc added 2 commits December 15, 2025 15:59

add vllm_venv for example-03

d5c003f

revert numpy

ce5f767

Copilot AI review requested due to automatic review settings December 16, 2025 00:01

Copilot AI reviewed Dec 16, 2025

View reviewed changes

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py

update readme

d0ee5eb

arekay-nv approved these changes Dec 16, 2025

View reviewed changes

viraatc merged commit d291e3f into main Dec 16, 2025
4 checks passed

github-actions Bot locked and limited conversation to collaborators Dec 16, 2025

viraatc deleted the feature/viraatc-vllm-compare-example branch February 6, 2026 23:08

Conversation

viraatc commented Dec 3, 2025

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Dec 3, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arekay-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arekay-nv commented Dec 4, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

viraatc commented Dec 5, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

github-actions Bot commented Dec 3, 2025 •

edited

Loading