Include tokens in text streamer callback#3802
Draft
mzegla wants to merge 1 commit into
Draft
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR extends ov::genai::TextStreamer so callbacks can optionally receive both the decoded text chunk and the corresponding token IDs, enabling advanced streaming/post-processing use cases (e.g., displaying text with special tokens skipped while still collecting those tokens for downstream logic).
Changes:
- Added a new
TextStreamerconstructor overload that accepts a tokens-aware callback(text, tokens). - Updated streaming logic to track token indices for each flushed text chunk and pass corresponding tokens to the callback.
- Added Python bindings for the tokens-aware callback constructor.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/python/py_streamers.cpp | Adds a second TextStreamer Python constructor accepting a (text, tokens) callback. |
| src/cpp/src/text_streamer.cpp | Implements tokens-aware callback wiring and token-chunk tracking/flushing logic. |
| src/cpp/include/openvino/genai/text_streamer.hpp | Exposes the new constructor overload and adds token index tracking state. |
Comment on lines
109
to
+116
| py::class_<TextStreamer, std::shared_ptr<TextStreamer>, StreamerBase>(m, "TextStreamer", text_streamer_docstring) | ||
| .def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string)> callback, const std::map<std::string, py::object>& detokenization_params) { | ||
| return std::make_shared<TextStreamer>(tokenizer, callback, pyutils::properties_to_any_map(detokenization_params)); | ||
| }), | ||
| py::arg("tokenizer"), | ||
| py::arg("callback"), | ||
| py::arg("detokenization_params") = ov::AnyMap({})) | ||
| .def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const std::map<std::string, py::object>& detokenization_params) { |
Comment on lines
+116
to
+121
| .def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const std::map<std::string, py::object>& detokenization_params) { | ||
| return std::make_shared<TextStreamer>(tokenizer, callback, pyutils::properties_to_any_map(detokenization_params)); | ||
| }), | ||
| py::arg("tokenizer"), | ||
| py::arg("callback"), | ||
| py::arg("detokenization_params") = ov::AnyMap({})) |
Comment on lines
+147
to
+156
| if (text.size() <= m_printed_len) { | ||
| // No new text, but flush any unprinted special tokens | ||
| auto chunk_tokens = std::vector<int64_t>(m_tokens_cache.begin() + m_printed_token_idx, m_tokens_cache.end()); | ||
| m_tokens_cache.clear(); | ||
| m_decoded_lengths.clear(); | ||
| m_printed_len = 0; | ||
| m_printed_token_idx = 0; | ||
| if (!chunk_tokens.empty()) { | ||
| m_subword_callback("", chunk_tokens); | ||
| } |
Comment on lines
+33
to
+35
| /// @brief Construct with a tokens-aware callback receiving both the decoded text chunk and the token IDs that produced it | ||
| TextStreamer(const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const ov::AnyMap& detokenization_params = {}); | ||
|
|
Comment on lines
+116
to
+121
| .def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const std::map<std::string, py::object>& detokenization_params) { | ||
| return std::make_shared<TextStreamer>(tokenizer, callback, pyutils::properties_to_any_map(detokenization_params)); | ||
| }), | ||
| py::arg("tokenizer"), | ||
| py::arg("callback"), | ||
| py::arg("detokenization_params") = ov::AnyMap({})) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This change adds tokens to text streamer callback for more advanced usage like: need to show text without special tokens to the user, but need those special tokens for post processing etc.