Skip to content

feat: add concurrency based scheduler#17

Merged
nvzhihanj merged 20 commits intomainfrom
feature/viraatc-concurrency-issuer
Nov 12, 2025
Merged

feat: add concurrency based scheduler#17
nvzhihanj merged 20 commits intomainfrom
feature/viraatc-concurrency-issuer

Conversation

@viraatc
Copy link
Copy Markdown
Collaborator

@viraatc viraatc commented Nov 7, 2025

What does this PR do?

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@viraatc viraatc requested a review from a team as a code owner November 7, 2025 00:04
Copilot AI review requested due to automatic review settings November 7, 2025 00:04
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 7, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@viraatc viraatc self-assigned this Nov 7, 2025
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @viraatc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the online benchmarking capabilities by introducing a concurrency-based load generation scheduler. This new feature allows users to simulate workloads where the number of simultaneous active requests is fixed, providing a different perspective on performance testing compared to the existing QPS-based Poisson scheduler. The changes include core scheduler implementation, updates to CLI argument parsing, and extensive documentation to guide users on the new options. The client-side sample issuer was also refactored for better maintainability.

Highlights

  • New Concurrency Scheduler: Introduced a new concurrency-based scheduler for online benchmarking, allowing users to maintain a fixed number of concurrent requests rather than a target QPS. This is enabled via the --load-pattern concurrency and --concurrency N CLI arguments.
  • Documentation Updates: Updated README.md, docs/CLI_QUICK_REFERENCE.md, and docs/LOCAL_TESTING.md to reflect the new --load-pattern argument, its required nature, and the specific usage of --target-qps for Poisson and --concurrency for the new concurrency pattern.
  • Refactoring and Code Clean-up: The loadgen.py file was renamed to http_sample_issuer.py for clearer semantics. The logic for handling StreamChunk and QueryResult in the HTTP client was simplified, and _client_idle_event.clear() was optimized.
  • Improved CLI Argument Handling: The --target-qps and --concurrency arguments were moved from shared benchmark arguments to online-specific arguments, making the CLI more precise. The --load-pattern argument is now explicitly required for online benchmarking.
  • Enhanced Testing: Added comprehensive unit tests for the new ConcurrencyScheduler to ensure proper gating of requests based on completions and adherence to the target concurrency level. Statistical tests were also added for the PoissonDistributionScheduler to validate its inter-arrival time distribution.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a concurrency-based scheduler for load testing, enabling fixed concurrent request patterns alongside the existing max-throughput and Poisson-based schedulers. This completes the scheduler architecture by implementing the previously TODO-marked concurrency mode.

Key changes:

  • Implemented ConcurrencyScheduler that maintains a fixed number of concurrent requests using event-driven coordination
  • Added comprehensive test coverage including statistical validation for Poisson distribution
  • Updated CLI to require explicit --load-pattern selection for online mode with corresponding parameter validation

Reviewed Changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/inference_endpoint/load_generator/scheduler.py Implements ConcurrencyScheduler with threading-based concurrency control
src/inference_endpoint/config/schema.py Updates LoadPattern configuration and validation for concurrency mode
src/inference_endpoint/config/runtime_settings.py Adds load_pattern field to RuntimeSettings
src/inference_endpoint/commands/benchmark.py Updates CLI logic to auto-detect load pattern and require explicit selection for online mode
src/inference_endpoint/cli.py Refactors argument parsing to make --load-pattern required for online mode
src/inference_endpoint/endpoint_client/http_sample_issuer.py Optimizes idle event handling and simplifies error response processing
tests/unit/load_generator/test_scheduler.py Adds comprehensive tests for ConcurrencyScheduler and PoissonDistributionScheduler with statistical validation
tests/conftest.py Adds fixtures for concurrency and poisson runtime settings
requirements/test.txt Adds scipy dependency for statistical testing
docs/*.md Updates documentation to reflect new concurrency mode and required CLI parameters

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/unit/load_generator/test_scheduler.py
Comment thread src/inference_endpoint/endpoint_client/http_sample_issuer.py Outdated
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new concurrency-based scheduler, a valuable addition for benchmarking. The implementation of the ConcurrencyScheduler is robust, and the related changes to the CLI, configuration, and documentation are well-executed and consistent. The new tests are particularly impressive, especially the deterministic test for the concurrency scheduler and the statistical validation for the Poisson scheduler, which significantly enhance the reliability of the load generation. I've identified one minor issue regarding code clarity in the CLI command logic that could be improved. Overall, this is an excellent and high-quality contribution.

Comment thread src/inference_endpoint/commands/benchmark.py
issued[position].wait()

with state_lock:
assert current_inflight == target_concurrency
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we check current_inflight at various points

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot seem to resolve this conversation?

Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copilot AI review requested due to automatic review settings November 8, 2025 00:15
@viraatc viraatc force-pushed the feature/viraatc-concurrency-issuer branch from 0c59560 to 6bd373c Compare November 8, 2025 00:15
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/inference_endpoint/endpoint_client/http_sample_issuer.py Outdated
Comment thread tests/unit/load_generator/test_scheduler.py
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings November 9, 2025 22:05
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Copilot AI review requested due to automatic review settings November 9, 2025 22:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@nvzhihanj nvzhihanj merged commit 26daafe into main Nov 12, 2025
4 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Nov 12, 2025
@viraatc viraatc deleted the feature/viraatc-concurrency-issuer branch February 6, 2026 23:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants