Skip to content

Tokens in VLMPipeline output#3808

Open
dkalinowski wants to merge 6 commits into
openvinotoolkit:masterfrom
dkalinowski:tokens-in-vlm-output
Open

Tokens in VLMPipeline output#3808
dkalinowski wants to merge 6 commits into
openvinotoolkit:masterfrom
dkalinowski:tokens-in-vlm-output

Conversation

@dkalinowski
Copy link
Copy Markdown
Collaborator

Description

Extend VLMDecodedResults (and DecodedResults) with additional field: tokens.
OVMS requires tokens for new tool parsers to work, for example by LFM or gemma models.

CVS-184756

Checklist:

  • This PR follows GenAI Contributing guidelines.
  • Tests have been updated or added to cover the new code.
  • This PR fully addresses the ticket. - PR in OVMS with LFM/gemma parsers will follow this topic
  • I have made corresponding changes to the documentation.

@github-actions github-actions Bot added category: visual language Visual language pipeline category: continuous batching Continuous batching category: LLM LLM pipeline (stateful, static) category: speculative decoding Speculative decoding category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: JS API GenAI JS API labels May 6, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends generation outputs to include raw token IDs alongside decoded texts, making DecodedResults / VLMDecodedResults usable for downstream token-based tooling (e.g., OVMS tool parsers).

Changes:

  • Added tokens to DecodedResults (C++), and populated it across LLM/VLM pipelines (including speculative decoding and continuous batching).
  • Exposed tokens through Python (pybind + .pyi) and JavaScript (N-API helper + TS wrappers/classes).
  • Added Python tests validating token availability and round-trip tokenizer.decode(tokens) == text.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/python_tests/test_vlm_pipeline.py Adds a VLM test validating tokens presence/type and decode round-trip.
tests/python_tests/test_llm_pipeline.py Adds an LLM test validating DecodedResults.tokens and decode round-trip.
src/python/py_openvino_genai.cpp Exposes DecodedResults.tokens via pybind11.
src/python/openvino_genai/py_openvino_genai.pyi Updates Python typing stubs to include DecodedResults.tokens.
src/js/src/helper.cpp Adds tokens field serialization to JS objects for LLM/VLM decoded results.
src/js/lib/pipelines/vlmPipeline.ts Threads tokens through VLM pipeline callback result into VLMDecodedResults.
src/js/lib/pipelines/llmPipeline.ts Threads tokens through LLM pipeline callback result into DecodedResults.
src/js/lib/decodedResults.ts Adds tokens field/constructor arg to DecodedResults and VLMDecodedResults.
src/cpp/src/visual_language/pipeline.cpp Populates decoded.tokens from encoded results in VLM pipeline.
src/cpp/src/visual_language/continuous_batching_adapter.hpp Propagates tokens in continuous batching adapter results.
src/cpp/src/speculative_decoding/stateful/stateful_pipeline_base.cpp Populates DecodedResults.tokens for stateful speculative decoding.
src/cpp/src/llm/pipeline_static.cpp Populates DecodedResults.tokens for LLM generation.
src/cpp/src/llm/pipeline_stateful.cpp Populates DecodedResults.tokens in stateful LLM decoded-result assembly.
src/cpp/src/continuous_batching/pipeline_base.cpp Adds tokens to VLMDecodedResults produced by continuous batching path.
src/cpp/include/openvino/genai/llm_pipeline.hpp Adds tokens member to the public DecodedResults C++ type.

Comment thread src/js/src/helper.cpp Outdated
Comment thread src/cpp/src/llm/pipeline_static.cpp
Comment thread src/cpp/src/llm/pipeline_stateful.cpp Outdated
Comment thread src/cpp/src/speculative_decoding/stateful/stateful_pipeline_base.cpp Outdated
Comment thread src/cpp/src/visual_language/pipeline.cpp
Comment thread src/cpp/src/continuous_batching/pipeline_base.cpp
Comment on lines 71 to 76
std::vector<std::string> texts;
std::vector<float> scores;
std::vector<GenerationFinishReason> finish_reasons;
/// @brief Generated token ids per sequence (parallels @ref texts).
std::vector<std::vector<int64_t>> tokens;
PerfMetrics perf_metrics;
Copilot AI review requested due to automatic review settings May 7, 2026 12:45
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Comment thread src/js/src/helper.cpp
Comment on lines +1049 to +1054
Napi::Array js_array = Napi::Array::New(env, value.size());
for (size_t i = 0; i < value.size(); ++i) {
const auto& sequence = value[i];
Napi::BigInt64Array sequence_array = Napi::BigInt64Array::New(env, sequence.size());
std::copy(sequence.begin(), sequence.end(), sequence_array.Data());
js_array[i] = sequence_array;
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread src/cpp/src/continuous_batching/pipeline_base.cpp
Comment thread src/cpp/src/continuous_batching/pipeline_base.cpp
Copy link
Copy Markdown
Contributor

@sgonorov sgonorov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to polish copilot comments first.

Copy link
Copy Markdown
Contributor

@yatarkan yatarkan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Retribution98 Could you please review JS part

@dkalinowski
Copy link
Copy Markdown
Collaborator Author

@sgonorov I cleaned up copilot comment except one: #3808 (comment)

It has valid point that it will break ABI for C++ users, for those who link against a prebuilt shared library.
We need to either accept that or hide getters behind accessors + pimpl pattern.

@Wovchena Wovchena mentioned this pull request May 14, 2026
1 task
Stanley00 pushed a commit to stanley-fork/openvino.genai that referenced this pull request May 15, 2026
## Description
From
openvinotoolkit#3808 (comment)

## Checklist:
- [x] This PR follows [GenAI Contributing
guidelines](https://github.com/openvinotoolkit/openvino.genai?tab=contributing-ov-file#contributing).
<!-- Always follow them. If there are deviations, explain what and why.
-->
N/A Tests have been updated or added to cover the new code. <!-- Specify
exactly which tests were added or updated. If the change isn't
maintenance related, update the tests at
https://github.com/openvinotoolkit/openvino.genai/tree/master/tests or
explain in the description why the tests don't need an update. -->
N/A This PR fully addresses the ticket. <!--- If not, explain clearly
what is covered and what is not. If follow-up pull requests are needed,
specify in the description. -->
N/A I have made corresponding changes to the documentation. <!-- Run
github.com/\<username>/openvino.genai/actions/workflows/deploy_gh_pages.yml
on your fork with your branch as a parameter to deploy a test version
with the updated content. Replace this comment with the link to the
built docs. If the documentation is updated in a separate PR, clearly
specify it. -->
Copy link
Copy Markdown
Contributor

@Retribution98 Retribution98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS part looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: continuous batching Continuous batching category: CPP API Changes in GenAI C++ public headers category: GGUF GGUF file reader category: JS API GenAI JS API category: LLM LLM pipeline (stateful, static) category: Python API Python API for GenAI category: speculative decoding Speculative decoding category: visual language Visual language pipeline

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants