fix: Update OpenAI backend to parse logprobs and token distributions by NullPointerDepressiveDisorder · Pull Request #13 · NullPointerDepressiveDisorder/infer-check

NullPointerDepressiveDisorder · 2026-04-13T02:49:08Z

Add unit tests to validate equal and uneven category prompt distribution
Extend load_suite to support limiting prompts by category balance
Update OpenAI backend to parse logprobs and token distributions

- Add unit tests to validate equal and uneven category prompt distribution - Extend `load_suite` to support limiting prompts by category balance - Update OpenAI backend to parse logprobs and token distributions

codecov · 2026-04-13T02:49:22Z

Codecov Report

❌ Patch coverage is 98.25581% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/infer_check/backends/openai_compat.py	95.83%	2 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR improves prompt-suite sampling and inference result introspection by adding category-balanced prompt limiting in load_suite, updating the CLI to use it, and extending the OpenAI-compatible backend to request/parse chat-completions logprobs and top-token distributions.

Changes:

Added unit tests validating equal/uneven category distribution behavior in load_suite.
Extended load_suite(..., num_prompts=...) to select prompts via round-robin across categories (instead of simple slicing in the CLI).
Updated the OpenAI-compatible chat backend to request and parse logprobs/top_logprobs, populating per-token distributions and metadata.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
tests/unit/test_loader_distribution.py	Adds unit coverage for category-balanced prompt selection and no-limit behavior.
src/infer_check/suites/loader.py	Implements `num_prompts` support with category-balanced selection and updated logging.
src/infer_check/cli.py	Delegates `--num-prompts` limiting to `load_suite` (removes post-load slicing).
src/infer_check/backends/openai_compat.py	Requests and parses chat logprobs/top-logprobs into `InferenceResult` fields.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ry handling

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Introduce _ServerHTTPError for clearer HTTP error propagation in OpenAICompatBackend - Refine logprobs retry logic to use status codes instead of string matching - Adjust prompt category assignment to allow None values instead of defaulting to "default" - Safeguard token extraction in logprobs parsing

test: add coverage for load_suite category distribution logic

9829426

- Add unit tests to validate equal and uneven category prompt distribution - Extend `load_suite` to support limiting prompts by category balance - Update OpenAI backend to parse logprobs and token distributions

Copilot AI review requested due to automatic review settings April 13, 2026 02:49

Copilot started reviewing on behalf of NullPointerDepressiveDisorder April 13, 2026 02:49 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Comment thread src/infer_check/suites/loader.py

Comment thread src/infer_check/backends/openai_compat.py Outdated

Comment thread src/infer_check/backends/openai_compat.py Outdated

Comment thread src/infer_check/backends/openai_compat.py

NullPointerDepressiveDisorder added 2 commits April 13, 2026 09:01

fix: improve logprobs handling in chat completions

9738475

test: add logprobs handling tests for chat completions

7c44989

NullPointerDepressiveDisorder requested a review from Copilot April 13, 2026 16:06

Copilot started reviewing on behalf of NullPointerDepressiveDisorder April 13, 2026 16:07 View session

NullPointerDepressiveDisorder changed the title ~~test: add coverage for load_suite category distribution logic~~ fix: Update OpenAI backend to parse logprobs and token distributions Apr 13, 2026

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Comment thread src/infer_check/suites/loader.py

Comment thread src/infer_check/backends/openai_compat.py Outdated

Comment thread src/infer_check/backends/openai_compat.py Outdated

refactor: extract chat completion POST logic and improve logprobs ret…

bd0173f

…ry handling

NullPointerDepressiveDisorder requested a review from Copilot April 13, 2026 19:24

Copilot started reviewing on behalf of NullPointerDepressiveDisorder April 13, 2026 19:25 View session

Copilot AI reviewed Apr 13, 2026

View reviewed changes

Comment thread src/infer_check/backends/openai_compat.py

Comment thread src/infer_check/backends/openai_compat.py Outdated

Comment thread src/infer_check/backends/openai_compat.py

Comment thread src/infer_check/suites/loader.py

NullPointerDepressiveDisorder merged commit aa75120 into main Apr 14, 2026
5 checks passed

NullPointerDepressiveDisorder deleted the fix/openai-compat branch April 14, 2026 08:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Update OpenAI backend to parse logprobs and token distributions#13

fix: Update OpenAI backend to parse logprobs and token distributions#13
NullPointerDepressiveDisorder merged 5 commits into
mainfrom
fix/openai-compat

NullPointerDepressiveDisorder commented Apr 13, 2026

Uh oh!

codecov Bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NullPointerDepressiveDisorder commented Apr 13, 2026

Uh oh!

codecov Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 13, 2026 •

edited

Loading