Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
Summary of ChangesHello @viraatc, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the online benchmarking capabilities by introducing a concurrency-based load generation scheduler. This new feature allows users to simulate workloads where the number of simultaneous active requests is fixed, providing a different perspective on performance testing compared to the existing QPS-based Poisson scheduler. The changes include core scheduler implementation, updates to CLI argument parsing, and extensive documentation to guide users on the new options. The client-side sample issuer was also refactored for better maintainability. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Pull Request Overview
This PR introduces a concurrency-based scheduler for load testing, enabling fixed concurrent request patterns alongside the existing max-throughput and Poisson-based schedulers. This completes the scheduler architecture by implementing the previously TODO-marked concurrency mode.
Key changes:
- Implemented
ConcurrencySchedulerthat maintains a fixed number of concurrent requests using event-driven coordination - Added comprehensive test coverage including statistical validation for Poisson distribution
- Updated CLI to require explicit
--load-patternselection for online mode with corresponding parameter validation
Reviewed Changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/inference_endpoint/load_generator/scheduler.py | Implements ConcurrencyScheduler with threading-based concurrency control |
| src/inference_endpoint/config/schema.py | Updates LoadPattern configuration and validation for concurrency mode |
| src/inference_endpoint/config/runtime_settings.py | Adds load_pattern field to RuntimeSettings |
| src/inference_endpoint/commands/benchmark.py | Updates CLI logic to auto-detect load pattern and require explicit selection for online mode |
| src/inference_endpoint/cli.py | Refactors argument parsing to make --load-pattern required for online mode |
| src/inference_endpoint/endpoint_client/http_sample_issuer.py | Optimizes idle event handling and simplifies error response processing |
| tests/unit/load_generator/test_scheduler.py | Adds comprehensive tests for ConcurrencyScheduler and PoissonDistributionScheduler with statistical validation |
| tests/conftest.py | Adds fixtures for concurrency and poisson runtime settings |
| requirements/test.txt | Adds scipy dependency for statistical testing |
| docs/*.md | Updates documentation to reflect new concurrency mode and required CLI parameters |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Code Review
This pull request introduces a new concurrency-based scheduler, a valuable addition for benchmarking. The implementation of the ConcurrencyScheduler is robust, and the related changes to the CLI, configuration, and documentation are well-executed and consistent. The new tests are particularly impressive, especially the deterministic test for the concurrency scheduler and the statistical validation for the Poisson scheduler, which significantly enhance the reliability of the load generation. I've identified one minor issue regarding code clarity in the CLI command logic that could be improved. Overall, this is an excellent and high-quality contribution.
| issued[position].wait() | ||
|
|
||
| with state_lock: | ||
| assert current_inflight == target_concurrency |
There was a problem hiding this comment.
we check current_inflight at various points
There was a problem hiding this comment.
I cannot seem to resolve this conversation?
0c59560 to
6bd373c
Compare
There was a problem hiding this comment.
Pull Request Overview
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
What does this PR do?
Type of change
Related issues
Testing
Checklist