Skip to content

--verbose_response parameter#4205

Open
dkalinowski wants to merge 3 commits into
mainfrom
verbose
Open

--verbose_response parameter#4205
dkalinowski wants to merge 3 commits into
mainfrom
verbose

Conversation

@dkalinowski
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings May 13, 2026 10:43
@dkalinowski dkalinowski added the WIP Do not merge until resolved label May 13, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new --verbose_response server setting to help debug LLM OpenAI-compatible chat/text completion responses by embedding raw prompt/output details into unary JSON responses.

Changes:

  • Add --verbose_response CLI flag and plumb it into ServerSettingsImpl.
  • Capture the post-template prompt in the request handler when verbose mode is enabled.
  • Extend OpenAI chat/text completions unary JSON serialization to optionally emit a __verbose object containing the prompt and raw decoded outputs.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/llm/servable.cpp Enables verbose mode per-request and passes the final prompt text into the API handler before tokenization.
src/llm/apis/openai_completions.cpp Adds optional __verbose section to unary responses for multiple result types (GenerationOutput, EncodedResults, VLM).
src/llm/apis/openai_api_handler.hpp Adds stored state + accessors for verbose mode (verboseResponse, verbosePrompt).
src/cli_parser.cpp Adds the --verbose_response CLI option and maps it into server settings.
src/capi_frontend/server_settings.hpp Adds verboseResponse boolean to ServerSettingsImpl.
Comments suppressed due to low confidence (1)

src/llm/apis/openai_completions.cpp:401

  • The __verbose serialization logic is duplicated across the three serializeUnaryResponse(...) overloads. Consider extracting it into a shared helper to reduce maintenance overhead and prevent the overloads from diverging over time.
    if (isVerboseResponse()) {
        jsonResponse.StartObject("__verbose");
        jsonResponse.String("prompt", getVerbosePrompt());
        jsonResponse.StartArray("raw_outputs");
        for (const ov::genai::GenerationOutput& generationOutput : generationOutputs) {

Comment on lines +397 to +401
if (isVerboseResponse()) {
jsonResponse.StartObject("__verbose");
jsonResponse.String("prompt", getVerbosePrompt());
jsonResponse.StartArray("raw_outputs");
for (const ov::genai::GenerationOutput& generationOutput : generationOutputs) {
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP Do not merge until resolved

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants