Tokens in VLMPipeline output#3808
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends generation outputs to include raw token IDs alongside decoded texts, making DecodedResults / VLMDecodedResults usable for downstream token-based tooling (e.g., OVMS tool parsers).
Changes:
- Added
tokenstoDecodedResults(C++), and populated it across LLM/VLM pipelines (including speculative decoding and continuous batching). - Exposed
tokensthrough Python (pybind +.pyi) and JavaScript (N-API helper + TS wrappers/classes). - Added Python tests validating token availability and round-trip
tokenizer.decode(tokens) == text.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/python_tests/test_vlm_pipeline.py | Adds a VLM test validating tokens presence/type and decode round-trip. |
| tests/python_tests/test_llm_pipeline.py | Adds an LLM test validating DecodedResults.tokens and decode round-trip. |
| src/python/py_openvino_genai.cpp | Exposes DecodedResults.tokens via pybind11. |
| src/python/openvino_genai/py_openvino_genai.pyi | Updates Python typing stubs to include DecodedResults.tokens. |
| src/js/src/helper.cpp | Adds tokens field serialization to JS objects for LLM/VLM decoded results. |
| src/js/lib/pipelines/vlmPipeline.ts | Threads tokens through VLM pipeline callback result into VLMDecodedResults. |
| src/js/lib/pipelines/llmPipeline.ts | Threads tokens through LLM pipeline callback result into DecodedResults. |
| src/js/lib/decodedResults.ts | Adds tokens field/constructor arg to DecodedResults and VLMDecodedResults. |
| src/cpp/src/visual_language/pipeline.cpp | Populates decoded.tokens from encoded results in VLM pipeline. |
| src/cpp/src/visual_language/continuous_batching_adapter.hpp | Propagates tokens in continuous batching adapter results. |
| src/cpp/src/speculative_decoding/stateful/stateful_pipeline_base.cpp | Populates DecodedResults.tokens for stateful speculative decoding. |
| src/cpp/src/llm/pipeline_static.cpp | Populates DecodedResults.tokens for LLM generation. |
| src/cpp/src/llm/pipeline_stateful.cpp | Populates DecodedResults.tokens in stateful LLM decoded-result assembly. |
| src/cpp/src/continuous_batching/pipeline_base.cpp | Adds tokens to VLMDecodedResults produced by continuous batching path. |
| src/cpp/include/openvino/genai/llm_pipeline.hpp | Adds tokens member to the public DecodedResults C++ type. |
| std::vector<std::string> texts; | ||
| std::vector<float> scores; | ||
| std::vector<GenerationFinishReason> finish_reasons; | ||
| /// @brief Generated token ids per sequence (parallels @ref texts). | ||
| std::vector<std::vector<int64_t>> tokens; | ||
| PerfMetrics perf_metrics; |
| Napi::Array js_array = Napi::Array::New(env, value.size()); | ||
| for (size_t i = 0; i < value.size(); ++i) { | ||
| const auto& sequence = value[i]; | ||
| Napi::BigInt64Array sequence_array = Napi::BigInt64Array::New(env, sequence.size()); | ||
| std::copy(sequence.begin(), sequence.end(), sequence_array.Data()); | ||
| js_array[i] = sequence_array; |
sgonorov
left a comment
There was a problem hiding this comment.
Make sure to polish copilot comments first.
yatarkan
left a comment
There was a problem hiding this comment.
@Retribution98 Could you please review JS part
|
@sgonorov I cleaned up copilot comment except one: #3808 (comment) It has valid point that it will break ABI for C++ users, for those who link against a prebuilt shared library. |
## Description From openvinotoolkit#3808 (comment) ## Checklist: - [x] This PR follows [GenAI Contributing guidelines](https://github.com/openvinotoolkit/openvino.genai?tab=contributing-ov-file#contributing). <!-- Always follow them. If there are deviations, explain what and why. --> N/A Tests have been updated or added to cover the new code. <!-- Specify exactly which tests were added or updated. If the change isn't maintenance related, update the tests at https://github.com/openvinotoolkit/openvino.genai/tree/master/tests or explain in the description why the tests don't need an update. --> N/A This PR fully addresses the ticket. <!--- If not, explain clearly what is covered and what is not. If follow-up pull requests are needed, specify in the description. --> N/A I have made corresponding changes to the documentation. <!-- Run github.com/\<username>/openvino.genai/actions/workflows/deploy_gh_pages.yml on your fork with your branch as a parameter to deploy a test version with the updated content. Replace this comment with the link to the built docs. If the documentation is updated in a separate PR, clearly specify it. -->
Description
Extend VLMDecodedResults (and DecodedResults) with additional field: tokens.
OVMS requires tokens for new tool parsers to work, for example by LFM or gemma models.
CVS-184756
Checklist: