fix: Update OpenAI backend to parse logprobs and token distributions#13
Conversation
- Add unit tests to validate equal and uneven category prompt distribution - Extend `load_suite` to support limiting prompts by category balance - Update OpenAI backend to parse logprobs and token distributions
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR improves prompt-suite sampling and inference result introspection by adding category-balanced prompt limiting in load_suite, updating the CLI to use it, and extending the OpenAI-compatible backend to request/parse chat-completions logprobs and top-token distributions.
Changes:
- Added unit tests validating equal/uneven category distribution behavior in
load_suite. - Extended
load_suite(..., num_prompts=...)to select prompts via round-robin across categories (instead of simple slicing in the CLI). - Updated the OpenAI-compatible chat backend to request and parse
logprobs/top_logprobs, populating per-token distributions and metadata.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| tests/unit/test_loader_distribution.py | Adds unit coverage for category-balanced prompt selection and no-limit behavior. |
| src/infer_check/suites/loader.py | Implements num_prompts support with category-balanced selection and updated logging. |
| src/infer_check/cli.py | Delegates --num-prompts limiting to load_suite (removes post-load slicing). |
| src/infer_check/backends/openai_compat.py | Requests and parses chat logprobs/top-logprobs into InferenceResult fields. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
load_suite category distribution logicThere was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Introduce _ServerHTTPError for clearer HTTP error propagation in OpenAICompatBackend - Refine logprobs retry logic to use status codes instead of string matching - Adjust prompt category assignment to allow None values instead of defaulting to "default" - Safeguard token extraction in logprobs parsing
load_suiteto support limiting prompts by category balance