Skip to content

feat: add compare_with_vllm.py example-03#38

Merged
viraatc merged 5 commits intomainfrom
feature/viraatc-vllm-compare-example
Dec 16, 2025
Merged

feat: add compare_with_vllm.py example-03#38
viraatc merged 5 commits intomainfrom
feature/viraatc-vllm-compare-example

Conversation

@viraatc
Copy link
Copy Markdown
Collaborator

@viraatc viraatc commented Dec 3, 2025

What does this PR do?

example script to run vllm and inference-endpoints on a given endpoint-url and compare metrics:

python examples/03_BenchmarkComparison/compare_with_vllm.py --model Qwen/Qwen2.5-0.5B-Instruct --endpoint <endpoint url>:<endpoint port> 

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

Copilot AI review requested due to automatic review settings December 3, 2025 23:33
@viraatc viraatc requested a review from a team as a code owner December 3, 2025 23:33
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Dec 3, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @viraatc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds a new example to facilitate direct performance comparisons between the inference-endpoint tool and vLLM for large language models. It provides a robust Python script and accompanying documentation that enables users to benchmark and evaluate the efficiency of different inference serving solutions under consistent conditions, offering insights into their respective throughput and latency characteristics.

Highlights

  • New Benchmarking Example: Introduced a new example script, compare_with_vllm.py, designed to compare the performance metrics of inference-endpoint against vLLM's benchmarking tool.
  • Comprehensive Documentation: Added a detailed README.md for the new 03_BenchmarkComparison example, outlining prerequisites, usage instructions, available command-line options, and an example of the comparison output.
  • Automated Metric Comparison: The Python script automates the execution of both benchmarking tools, parses their respective outputs using regular expressions, and presents a clear, tabular comparison of key performance indicators like throughput, TTFT (Time To First Token), and TPOT (Time Per Output Token).
  • Server Warmup Mechanism: Included a server warmup function to ensure the inference server is fully operational and responsive before commencing the actual benchmarks, improving the reliability of performance measurements.
  • Standardized Datasets: Provided two identical JSONL datasets (dummy_prompts_ie.jsonl and dummy_prompts_vllm.jsonl), each formatted specifically for inference-endpoint and vLLM respectively, to ensure a fair comparison using the same set of prompts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a benchmark comparison example that allows users to compare performance metrics between inference-endpoint and vLLM's benchmarking tools using identical prompts.

Key Changes:

  • Adds a new example script that benchmarks both vLLM and inference-endpoint on the same dataset
  • Includes pre-generated JSONL datasets in formats compatible with each tool
  • Provides comprehensive documentation with usage examples

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
examples/03_BenchmarkComparison/compare_with_vllm.py Main comparison script that runs both benchmarks and displays comparative metrics
examples/03_BenchmarkComparison/dummy_prompts_vllm.jsonl Dataset file with 96 prompts in vLLM format ({"prompt": "..."})
examples/03_BenchmarkComparison/dummy_prompts_ie.jsonl Dataset file with 96 prompts in inference-endpoint format ({"text_input": "..."})
examples/03_BenchmarkComparison/README.md Documentation explaining prerequisites, usage, and expected output

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/03_BenchmarkComparison/dummy_prompts_vllm.jsonl Outdated
Comment thread examples/03_BenchmarkComparison/dummy_prompts_vllm.jsonl Outdated
Comment thread examples/03_BenchmarkComparison/dummy_prompts_ie.jsonl Outdated
Comment thread examples/03_BenchmarkComparison/dummy_prompts_ie.jsonl Outdated
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new example script for comparing the performance of inference-endpoint with vllm. The script is well-structured and includes helpful features like a dry-run mode and server warmup. The accompanying README provides clear instructions. I've made a few suggestions to improve code compatibility, robustness, and maintainability. Additionally, I noticed a duplicate prompt in the dummy datasets.

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py
Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py
Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py
Comment thread examples/03_BenchmarkComparison/dummy_prompts_ie.jsonl Outdated
Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py Outdated
Comment thread examples/03_BenchmarkComparison/README.md
Comment thread examples/03_BenchmarkComparison/README.md Outdated
@arekay-nv arekay-nv requested a review from anandhu-eng December 4, 2025 17:41
@arekay-nv
Copy link
Copy Markdown
Collaborator

@anandhu-eng can you try out this example to ensure that we aren't missing anything.

Copilot AI review requested due to automatic review settings December 5, 2025 00:54
@viraatc viraatc force-pushed the feature/viraatc-vllm-compare-example branch from a44df83 to 895b918 Compare December 5, 2025 00:54
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

examples/03_BenchmarkComparison/compare_with_vllm.py:1

  • This appears to be duplicated logic from dataloader.py line 328. The walrus operator pattern if line := line.strip(): is used in dataloader.py but here on line 328 there's just line.strip() without assignment, which suggests this may be unintended or the diff display is incorrect. If this is actually in compare_with_vllm.py, this line has no effect.
#!/usr/bin/env python3

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/configs.py
@viraatc
Copy link
Copy Markdown
Collaborator Author

viraatc commented Dec 5, 2025

rebased onto #32

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py Fixed
Copilot AI review requested due to automatic review settings December 5, 2025 01:15
@viraatc viraatc force-pushed the feature/viraatc-vllm-compare-example branch from 4476f1c to 34ff0c8 Compare December 5, 2025 01:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/configs.py
Comment thread src/inference_endpoint/commands/probe.py Outdated
Comment thread src/inference_endpoint/dataset_manager/dataloader.py
@viraatc viraatc force-pushed the feature/viraatc-vllm-compare-example branch 3 times, most recently from a729cad to 1e23313 Compare December 5, 2025 01:21
Copilot AI review requested due to automatic review settings December 5, 2025 01:24
@viraatc viraatc force-pushed the feature/viraatc-vllm-compare-example branch from 1e23313 to 8f7d5d4 Compare December 5, 2025 01:24
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/configs.py
Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py
@viraatc viraatc force-pushed the feature/viraatc-vllm-compare-example branch from 8f7d5d4 to 246bca2 Compare December 5, 2025 01:43
Comment thread examples/03_BenchmarkComparison/README.md
Comment thread requirements/test.txt Outdated
Copilot AI review requested due to automatic review settings December 8, 2025 17:00
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py
Copilot AI review requested due to automatic review settings December 15, 2025 19:21
@viraatc viraatc force-pushed the feature/viraatc-vllm-compare-example branch from 1d5a568 to cd7c959 Compare December 15, 2025 19:21
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 29 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/configs.py
Comment thread src/inference_endpoint/commands/probe.py
Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py
@viraatc viraatc force-pushed the feature/viraatc-vllm-compare-example branch from bc5420a to e3dce1b Compare December 15, 2025 19:29
@viraatc viraatc force-pushed the feature/viraatc-vllm-compare-example branch from e3dce1b to c3eec49 Compare December 15, 2025 19:35
Copilot AI review requested due to automatic review settings December 15, 2025 19:45
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread requirements/base.txt Outdated
Copilot AI review requested due to automatic review settings December 16, 2025 00:01
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 8 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread examples/03_BenchmarkComparison/compare_with_vllm.py
Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@viraatc viraatc merged commit d291e3f into main Dec 16, 2025
4 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Dec 16, 2025
@viraatc viraatc deleted the feature/viraatc-vllm-compare-example branch February 6, 2026 23:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants