Use reporter-generated Report as source for final metrics by anandhu-eng · Pull Request #43 · mlcommons/endpoints

anandhu-eng · 2025-12-05T12:30:29Z

What does this PR do?

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

github-actions · 2025-12-05T12:30:38Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2025-12-05T12:30:43Z

Summary of Changes

Hello @anandhu-eng, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the benchmark command to standardize how performance metrics are collected and reported. Instead of calculating metrics ad-hoc, it now leverages a dedicated report object generated by the session's load generator. This change aims to improve the accuracy and reliability of benchmark results by using a single, authoritative source for all key performance indicators, while also adding error handling for missing reports.

Highlights

Centralized Metric Source: The benchmark command now uses a report object, generated by the session's reporter, as the authoritative source for all final metrics such as elapsed time, total samples, successful samples, and estimated QPS.
Enhanced Metric Accuracy: By relying on the reporter-generated report, the system ensures more accurate and consistent metrics. The estimated QPS now prioritizes the report's value and falls back to a computed value if necessary.
Robust Error Handling: A check has been introduced to verify the presence of the session.report object. If the report is missing, an ExecutionError is raised, preventing the benchmark from proceeding with incomplete data.
Session Report Storage: The Session class now includes a report attribute, which is populated after the test finishes, allowing external callers (like the benchmark command) to access the comprehensive test report.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the benchmark command to use the reporter-generated Report object as the authoritative source for final metrics. This is a good change that centralizes metric calculation and improves accuracy. The changes in BenchmarkSession to store the report and in _run_benchmark to consume it are correct. My review includes a suggestion to simplify the metric-gathering logic in benchmark.py for better readability and maintainability.

Copilot

Pull request overview

This PR refactors the benchmark command to use the reporter-generated Report object as the authoritative source for final metrics, rather than manually tracking timing and sample counts. This improves consistency and reduces redundancy by leveraging the comprehensive metrics already computed by the MetricsReporter.

Key changes:

Store the generated report on the BenchmarkSession object for external access
Replace manual timing (start_time/elapsed_time) and counting (scheduler.total_samples_to_issue) with values from the report
Add error handling to ensure the report is available before extracting metrics

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
src/inference_endpoint/load_generator/session.py	Added `self.report` attribute to store the generated report after test completion, enabling external callers to access authoritative metrics
src/inference_endpoint/commands/benchmark.py	Refactored metrics calculation to use report fields (`duration_ns`, `n_samples_issued`, `n_samples_completed`, `qps`) instead of manual tracking, with error handling for missing reports

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

tests/unit/commands/test_utils.py:194

This test creates multiple template files in the current working directory without cleanup. Each iteration creates a file like "offline_template.yaml", "online_template.yaml", etc., which remain after the test completes. This pollutes the workspace and can cause issues across test runs. Consider adding cleanup logic to remove generated files after assertions, or use tmp_path to create files in a temporary directory that pytest automatically cleans up.

    async def test_init_all_templates(self):
        """Test generating all template types."""
        templates = ["offline", "online", "eval", "submission"]

        for template_type in templates:
            output_file = Path(f"{template_type}_template.yaml")
            args = MagicMock()
            args.template = template_type

            await run_init_command(args)

            assert output_file.exists()
            assert output_file.stat().st_size > 0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

arekay-nv

LGTM - Thanks!

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

use reporter-generated Report as source for final metrics

72055cb

Copilot AI review requested due to automatic review settings December 5, 2025 12:30

anandhu-eng requested a review from a team as a code owner December 5, 2025 12:30

github-actions Bot requested review from arekay-nv and nvzhihanj December 5, 2025 12:30

Copilot started reviewing on behalf of anandhu-eng December 5, 2025 12:30 View session

Copilot finished reviewing on behalf of anandhu-eng December 5, 2025 12:32

gemini-code-assist Bot reviewed Dec 5, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py Outdated

Copilot AI reviewed Dec 5, 2025

View reviewed changes

remove unused import

c9e0394

anandhu-eng linked an issue Dec 5, 2025 that may be closed by this pull request

[bug] Ensure CLI report and json report use same data #39

Closed

clean code

b0d9de4

Copilot AI review requested due to automatic review settings December 5, 2025 13:01

Copilot started reviewing on behalf of anandhu-eng December 5, 2025 13:02 View session

Copilot finished reviewing on behalf of anandhu-eng December 5, 2025 13:04

Copilot AI reviewed Dec 5, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py

arekay-nv requested a review from nv-alicheng December 5, 2025 15:36

arekay-nv reviewed Dec 10, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py Outdated

anandhu-eng and others added 2 commits December 11, 2025 12:05

Merge branch 'main' into fix/use_reported_generated_results

10aac0a

Use report directory to save final results

05cbb1e

Copilot AI review requested due to automatic review settings December 11, 2025 07:17

Copilot started reviewing on behalf of anandhu-eng December 11, 2025 07:17 View session

Copilot AI reviewed Dec 11, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py Outdated

Comment thread src/inference_endpoint/commands/benchmark.py Outdated

Comment thread src/inference_endpoint/commands/benchmark.py

Comment thread src/inference_endpoint/commands/benchmark.py Outdated

anandhu-eng marked this pull request as draft December 11, 2025 15:07

anandhu-eng added 2 commits December 11, 2025 20:58

Reuse Path object + code clean

ef70628

Fix for tests

8db55a1

Copilot AI review requested due to automatic review settings December 11, 2025 18:02

Copilot started reviewing on behalf of anandhu-eng December 11, 2025 18:03 View session

anandhu-eng marked this pull request as ready for review December 13, 2025 20:07

Copilot AI review requested due to automatic review settings December 13, 2025 20:07

github-actions Bot requested a review from arekay-nv December 13, 2025 20:07

Copilot started reviewing on behalf of anandhu-eng December 13, 2025 20:08 View session

Copilot AI reviewed Dec 13, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py Outdated

Comment thread src/inference_endpoint/commands/benchmark.py Outdated

Update error description

6a75b0b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings December 13, 2025 20:14

Copilot started reviewing on behalf of anandhu-eng December 13, 2025 20:14 View session

Copilot AI reviewed Dec 13, 2025

View reviewed changes

anandhu-eng and others added 2 commits December 14, 2025 01:48

Pre commit changes

e5d3dd6

Merge branch 'main' into fix/use_reported_generated_results

a64bcc9

Copilot AI review requested due to automatic review settings December 19, 2025 12:04

Copilot started reviewing on behalf of anandhu-eng December 19, 2025 12:05 View session

Copilot AI reviewed Dec 19, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py Outdated

Comment thread src/inference_endpoint/cli.py

anandhu-eng added 2 commits December 19, 2025 18:57

remove output arg for aux args parser

d67442c

remove --output flag for Init

f3b648c

Copilot AI review requested due to automatic review settings December 21, 2025 13:03

Copilot started reviewing on behalf of anandhu-eng December 21, 2025 13:04 View session

Copilot AI reviewed Dec 21, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py Outdated

clean files after test + fix logic for failed count

482c231

arekay-nv approved these changes Dec 23, 2025

View reviewed changes

Merge branch 'main' into fix/use_reported_generated_results

5e9327c

Copilot AI review requested due to automatic review settings December 31, 2025 17:45

Copilot AI reviewed Dec 31, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py

Comment thread src/inference_endpoint/commands/benchmark.py

Comment thread src/inference_endpoint/commands/utils.py Outdated

Update src/inference_endpoint/commands/utils.py

9ae3c68

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings December 31, 2025 17:50

Copilot AI reviewed Dec 31, 2025

View reviewed changes

Comment thread tests/unit/commands/test_utils.py

arekay-nv merged commit ba36ac4 into main Dec 31, 2025
4 checks passed

github-actions Bot locked and limited conversation to collaborators Dec 31, 2025

arekay-nv deleted the fix/use_reported_generated_results branch January 5, 2026 23:06

Conversation

anandhu-eng commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Dec 5, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

arekay-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

anandhu-eng commented Dec 5, 2025 •

edited

Loading

github-actions Bot commented Dec 5, 2025 •

edited

Loading