feat: add concurrency based scheduler by viraatc · Pull Request #17 · mlcommons/endpoints

viraatc · 2025-11-07T00:04:54Z

What does this PR do?

Type of change

Bug fix
New feature
Documentation update
Refactor/cleanup

Related issues

Testing

Tests added/updated
All tests pass locally
Manual testing completed

Checklist

Code follows project style
Pre-commit hooks pass
Documentation updated (if needed)

github-actions · 2025-11-07T00:05:03Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

gemini-code-assist · 2025-11-07T00:05:24Z

Summary of Changes

Hello @viraatc, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the online benchmarking capabilities by introducing a concurrency-based load generation scheduler. This new feature allows users to simulate workloads where the number of simultaneous active requests is fixed, providing a different perspective on performance testing compared to the existing QPS-based Poisson scheduler. The changes include core scheduler implementation, updates to CLI argument parsing, and extensive documentation to guide users on the new options. The client-side sample issuer was also refactored for better maintainability.

Highlights

New Concurrency Scheduler: Introduced a new concurrency-based scheduler for online benchmarking, allowing users to maintain a fixed number of concurrent requests rather than a target QPS. This is enabled via the --load-pattern concurrency and --concurrency N CLI arguments.
Documentation Updates: Updated README.md, docs/CLI_QUICK_REFERENCE.md, and docs/LOCAL_TESTING.md to reflect the new --load-pattern argument, its required nature, and the specific usage of --target-qps for Poisson and --concurrency for the new concurrency pattern.
Refactoring and Code Clean-up: The loadgen.py file was renamed to http_sample_issuer.py for clearer semantics. The logic for handling StreamChunk and QueryResult in the HTTP client was simplified, and _client_idle_event.clear() was optimized.
Improved CLI Argument Handling: The --target-qps and --concurrency arguments were moved from shared benchmark arguments to online-specific arguments, making the CLI more precise. The --load-pattern argument is now explicitly required for online benchmarking.
Enhanced Testing: Added comprehensive unit tests for the new ConcurrencyScheduler to ensure proper gating of requests based on completions and adherence to the target concurrency level. Statistical tests were also added for the PoissonDistributionScheduler to validate its inter-arrival time distribution.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Copilot

Pull Request Overview

This PR introduces a concurrency-based scheduler for load testing, enabling fixed concurrent request patterns alongside the existing max-throughput and Poisson-based schedulers. This completes the scheduler architecture by implementing the previously TODO-marked concurrency mode.

Key changes:

Implemented ConcurrencyScheduler that maintains a fixed number of concurrent requests using event-driven coordination
Added comprehensive test coverage including statistical validation for Poisson distribution
Updated CLI to require explicit --load-pattern selection for online mode with corresponding parameter validation

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/inference_endpoint/load_generator/scheduler.py	Implements ConcurrencyScheduler with threading-based concurrency control
src/inference_endpoint/config/schema.py	Updates LoadPattern configuration and validation for concurrency mode
src/inference_endpoint/config/runtime_settings.py	Adds load_pattern field to RuntimeSettings
src/inference_endpoint/commands/benchmark.py	Updates CLI logic to auto-detect load pattern and require explicit selection for online mode
src/inference_endpoint/cli.py	Refactors argument parsing to make --load-pattern required for online mode
src/inference_endpoint/endpoint_client/http_sample_issuer.py	Optimizes idle event handling and simplifies error response processing
tests/unit/load_generator/test_scheduler.py	Adds comprehensive tests for ConcurrencyScheduler and PoissonDistributionScheduler with statistical validation
tests/conftest.py	Adds fixtures for concurrency and poisson runtime settings
requirements/test.txt	Adds scipy dependency for statistical testing
docs/*.md	Updates documentation to reflect new concurrency mode and required CLI parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

gemini-code-assist

Code Review

This pull request introduces a new concurrency-based scheduler, a valuable addition for benchmarking. The implementation of the ConcurrencyScheduler is robust, and the related changes to the CLI, configuration, and documentation are well-executed and consistent. The new tests are particularly impressive, especially the deterministic test for the concurrency scheduler and the statistical validation for the Poisson scheduler, which significantly enhance the reliability of the load generation. I've identified one minor issue regarding code clarity in the CLI command logic that could be improved. Overall, this is an excellent and high-quality contribution.

viraatc · 2025-11-10T21:36:35Z

+            issued[position].wait()
+
+            with state_lock:
+                assert current_inflight == target_concurrency


we check current_inflight at various points

I cannot seem to resolve this conversation?

nvzhihanj

LGTM

Copilot

Pull Request Overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

viraatc added 14 commits October 31, 2025 12:55

rename: loadge.py -> http_sample_issuer.py

59aedb8

add concurrency scheduler

5c73b2e

add poisson gen test

e7014ec

fix

a824c29

use scipy

ed19973

cleanup

86fa7d6

address comments

76f2dbc

update LOCAL_TESTING.md

0a997ab

cleanup comments

bbf4e00

fix CI

b257bf8

address comments

ca573ba

address comments

97e4060

address comments

cb7b76d

update tests

5074226

viraatc requested a review from a team as a code owner November 7, 2025 00:04

Copilot AI review requested due to automatic review settings November 7, 2025 00:04

viraatc self-assigned this Nov 7, 2025

viraatc requested review from arekay-nv, nv-alicheng and nvzhihanj November 7, 2025 00:05

Copilot AI reviewed Nov 7, 2025

View reviewed changes

Comment thread tests/unit/load_generator/test_scheduler.py

Comment thread src/inference_endpoint/endpoint_client/http_sample_issuer.py Outdated

gemini-code-assist Bot reviewed Nov 7, 2025

View reviewed changes

Comment thread src/inference_endpoint/commands/benchmark.py

address comments, update concurrency test

176e3ea

github-code-quality Bot found potential problems Nov 7, 2025

View reviewed changes

nvzhihanj approved these changes Nov 8, 2025

View reviewed changes

address comments, update docs

6bd373c

Copilot AI review requested due to automatic review settings November 8, 2025 00:15

viraatc force-pushed the feature/viraatc-concurrency-issuer branch from 0c59560 to 6bd373c Compare November 8, 2025 00:15

Copilot AI reviewed Nov 8, 2025

View reviewed changes

Comment thread src/inference_endpoint/endpoint_client/http_sample_issuer.py Outdated

Comment thread tests/unit/load_generator/test_scheduler.py

nv-alicheng approved these changes Nov 8, 2025

View reviewed changes

arekay-nv approved these changes Nov 8, 2025

View reviewed changes

Update src/inference_endpoint/endpoint_client/http_sample_issuer.py

5d38bdf

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings November 9, 2025 22:05

Copilot AI reviewed Nov 9, 2025

View reviewed changes

nvzhihanj added 2 commits November 9, 2025 14:05

Merge branch 'main' into feature/viraatc-concurrency-issuer

22ff620

Merge branch 'main' into feature/viraatc-concurrency-issuer

f977210

Copilot AI review requested due to automatic review settings November 9, 2025 22:14

Copilot AI reviewed Nov 9, 2025

View reviewed changes

Merge branch 'main' into feature/viraatc-concurrency-issuer

27c9228

nvzhihanj merged commit 26daafe into main Nov 12, 2025
4 checks passed

github-actions Bot locked and limited conversation to collaborators Nov 12, 2025

viraatc deleted the feature/viraatc-concurrency-issuer branch February 6, 2026 23:08

Conversation

viraatc commented Nov 7, 2025

What does this PR do?

Type of change

Related issues

Testing

Checklist

Uh oh!

github-actions Bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot commented Nov 7, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

viraatc Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

nvzhihanj Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

nvzhihanj left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions Bot commented Nov 7, 2025 •

edited

Loading