Skip to content

Adds LanguageModelRateLimitingPlugin. Closes #1309#1324

Merged
garrytrinder merged 4 commits into
dotnet:mainfrom
waldekmastykarz:lmratelimitingplugin
Jul 15, 2025
Merged

Adds LanguageModelRateLimitingPlugin. Closes #1309#1324
garrytrinder merged 4 commits into
dotnet:mainfrom
waldekmastykarz:lmratelimitingplugin

Conversation

@waldekmastykarz
Copy link
Copy Markdown
Collaborator

Adds LanguageModelRateLimitingPlugin. Closes #1309

Test:

devproxyrc.json:

{
  "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/rc.schema.json",
  "plugins": [
    {
      "name": "LanguageModelRateLimitingPlugin",
      "enabled": true,
      "pluginPath": "~appFolder/plugins/DevProxy.Plugins.dll",
      "configSection": "languageModelRateLimitingPlugin"
    }
  ],
  "urlsToWatch": [
    "*"
  ],
  "languageModelRateLimitingPlugin": {
    "$schema": "https://raw.githubusercontent.com/dotnet/dev-proxy/main/schemas/v1.0.0/languagemodelratelimitingplugin.schema.json",
    "promptTokenLimit": 500,
    "completionTokenLimit": 500,
    "resetTimeWindowSeconds": 300
  },
  "logLevel": "information",
  "newVersionNotification": "stable",
  "showSkipMessages": true,
  "showTimestamps": true,
  "validateSchemas": true,
  "asSystemProxy": false
}

Call ollama a few times:

cucurl -ikx http://127.0.0.1:8000 -X POST http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.2",
    "messages": [
      {
        "role": "user",
        "content": "Why is the sky blue?"
      }
    ]
  }'

Roughly third request should fail with a 429

The schema validation error on startup is expected because the schema is in this PR.

@waldekmastykarz waldekmastykarz requested a review from a team as a code owner July 13, 2025 10:50
@garrytrinder garrytrinder requested a review from Copilot July 14, 2025 08:11
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a new LanguageModelRateLimitingPlugin to enforce per-window token quotas and provide throttling or custom responses on limit exceed.

  • Introduces JSON schemas for plugin configuration and custom response files
  • Implements the core plugin logic and a file-watcher loader for custom responses
  • Updates OpenAIModels classes from abstract to concrete to support deserialization

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
schemas/v1.0.0/languagemodelratelimitingplugin.schema.json Defines config properties for the rate-limiting plugin
schemas/v1.0.0/languagemodelratelimitingplugin.customresponsefile.schema.json Defines schema for custom error responses
DevProxy.Plugins/Behavior/LanguageModelRateLimitingPlugin.cs Implements rate-limiting, token tracking, throttle/custom responses
DevProxy.Plugins/Behavior/LanguageModelRateLimitingCustomResponseLoader.cs Loads and watches custom response files
DevProxy.Abstractions/LanguageModel/OpenAIModels.cs Changed OpenAIRequest/Response from abstract classes to concrete classes
Comments suppressed due to low confidence (4)

schemas/v1.0.0/languagemodelratelimitingplugin.schema.json:4

  • Add a "required" array to enforce mandatory properties (e.g., "promptTokenLimit", "completionTokenLimit", "resetTimeWindowSeconds") to ensure configuration completeness.
  "type": "object",

schemas/v1.0.0/languagemodelratelimitingplugin.customresponsefile.schema.json:5

  • Define a "required" array (e.g., ["body", "statusCode"]) so consumers must include these mandatory fields in custom response files.
  "type": "object",

DevProxy.Plugins/Behavior/LanguageModelRateLimitingPlugin.cs:38

  • Consider adding unit tests covering rate limiting logic (token decrement, reset window, throttle/custom response) to ensure behavior is validated and prevent regressions.
public sealed class LanguageModelRateLimitingPlugin(

DevProxy.Plugins/Behavior/LanguageModelRateLimitingPlugin.cs:155

  • The empty list literal '[]' may not compile or infer the correct type for headersList. Use 'new List()' to explicitly create an empty List.
                        [];

Comment thread DevProxy.Plugins/Behavior/LanguageModelRateLimitingPlugin.cs
@garrytrinder
Copy link
Copy Markdown
Collaborator

image

@garrytrinder garrytrinder merged commit 7077cae into dotnet:main Jul 15, 2025
4 checks passed
@waldekmastykarz waldekmastykarz deleted the lmratelimitingplugin branch July 15, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: LLMRateLimitingPlugin to simulate token-based throttling

3 participants