Skip to content

Include tokens in text streamer callback#3802

Draft
mzegla wants to merge 1 commit into
openvinotoolkit:masterfrom
mzegla:stream_tokens
Draft

Include tokens in text streamer callback#3802
mzegla wants to merge 1 commit into
openvinotoolkit:masterfrom
mzegla:stream_tokens

Conversation

@mzegla
Copy link
Copy Markdown
Collaborator

@mzegla mzegla commented May 5, 2026

Description

This change adds tokens to text streamer callback for more advanced usage like: need to show text without special tokens to the user, but need those special tokens for post processing etc.

Copilot AI review requested due to automatic review settings May 5, 2026 13:56
@github-actions github-actions Bot added category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers category: text streamer labels May 5, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends ov::genai::TextStreamer so callbacks can optionally receive both the decoded text chunk and the corresponding token IDs, enabling advanced streaming/post-processing use cases (e.g., displaying text with special tokens skipped while still collecting those tokens for downstream logic).

Changes:

  • Added a new TextStreamer constructor overload that accepts a tokens-aware callback (text, tokens).
  • Updated streaming logic to track token indices for each flushed text chunk and pass corresponding tokens to the callback.
  • Added Python bindings for the tokens-aware callback constructor.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/python/py_streamers.cpp Adds a second TextStreamer Python constructor accepting a (text, tokens) callback.
src/cpp/src/text_streamer.cpp Implements tokens-aware callback wiring and token-chunk tracking/flushing logic.
src/cpp/include/openvino/genai/text_streamer.hpp Exposes the new constructor overload and adds token index tracking state.

Comment on lines 109 to +116
py::class_<TextStreamer, std::shared_ptr<TextStreamer>, StreamerBase>(m, "TextStreamer", text_streamer_docstring)
.def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string)> callback, const std::map<std::string, py::object>& detokenization_params) {
return std::make_shared<TextStreamer>(tokenizer, callback, pyutils::properties_to_any_map(detokenization_params));
}),
py::arg("tokenizer"),
py::arg("callback"),
py::arg("detokenization_params") = ov::AnyMap({}))
.def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const std::map<std::string, py::object>& detokenization_params) {
Comment on lines +116 to +121
.def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const std::map<std::string, py::object>& detokenization_params) {
return std::make_shared<TextStreamer>(tokenizer, callback, pyutils::properties_to_any_map(detokenization_params));
}),
py::arg("tokenizer"),
py::arg("callback"),
py::arg("detokenization_params") = ov::AnyMap({}))
Comment on lines +147 to +156
if (text.size() <= m_printed_len) {
// No new text, but flush any unprinted special tokens
auto chunk_tokens = std::vector<int64_t>(m_tokens_cache.begin() + m_printed_token_idx, m_tokens_cache.end());
m_tokens_cache.clear();
m_decoded_lengths.clear();
m_printed_len = 0;
m_printed_token_idx = 0;
if (!chunk_tokens.empty()) {
m_subword_callback("", chunk_tokens);
}
Comment on lines +33 to +35
/// @brief Construct with a tokens-aware callback receiving both the decoded text chunk and the token IDs that produced it
TextStreamer(const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const ov::AnyMap& detokenization_params = {});

Comment on lines +116 to +121
.def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const std::map<std::string, py::object>& detokenization_params) {
return std::make_shared<TextStreamer>(tokenizer, callback, pyutils::properties_to_any_map(detokenization_params));
}),
py::arg("tokenizer"),
py::arg("callback"),
py::arg("detokenization_params") = ov::AnyMap({}))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPP API Changes in GenAI C++ public headers category: Python API Python API for GenAI category: text streamer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants