Include tokens in text streamer callback by mzegla · Pull Request #3802 · openvinotoolkit/openvino.genai

mzegla · 2026-05-05T13:56:46Z

Description

This change adds tokens to text streamer callback for more advanced usage like: need to show text without special tokens to the user, but need those special tokens for post processing etc.

Copilot

Pull request overview

This PR extends ov::genai::TextStreamer so callbacks can optionally receive both the decoded text chunk and the corresponding token IDs, enabling advanced streaming/post-processing use cases (e.g., displaying text with special tokens skipped while still collecting those tokens for downstream logic).

Changes:

Added a new TextStreamer constructor overload that accepts a tokens-aware callback (text, tokens).
Updated streaming logic to track token indices for each flushed text chunk and pass corresponding tokens to the callback.
Added Python bindings for the tokens-aware callback constructor.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
src/python/py_streamers.cpp	Adds a second `TextStreamer` Python constructor accepting a `(text, tokens)` callback.
src/cpp/src/text_streamer.cpp	Implements tokens-aware callback wiring and token-chunk tracking/flushing logic.
src/cpp/include/openvino/genai/text_streamer.hpp	Exposes the new constructor overload and adds token index tracking state.

    py::class_<TextStreamer, std::shared_ptr<TextStreamer>, StreamerBase>(m, "TextStreamer", text_streamer_docstring)
        .def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string)> callback, const std::map<std::string, py::object>& detokenization_params) {
            return std::make_shared<TextStreamer>(tokenizer, callback, pyutils::properties_to_any_map(detokenization_params));
        }),
        py::arg("tokenizer"),
        py::arg("callback"),
        py::arg("detokenization_params") = ov::AnyMap({}))
+        .def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const std::map<std::string, py::object>& detokenization_params) {


+        .def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const std::map<std::string, py::object>& detokenization_params) {
+            return std::make_shared<TextStreamer>(tokenizer, callback, pyutils::properties_to_any_map(detokenization_params));
+        }),
+        py::arg("tokenizer"),
+        py::arg("callback"),
+        py::arg("detokenization_params") = ov::AnyMap({}))


+    if (text.size() <= m_printed_len) {
+        // No new text, but flush any unprinted special tokens
+        auto chunk_tokens = std::vector<int64_t>(m_tokens_cache.begin() + m_printed_token_idx, m_tokens_cache.end());
+        m_tokens_cache.clear();
+        m_decoded_lengths.clear();
+        m_printed_len = 0;
+        m_printed_token_idx = 0;
+        if (!chunk_tokens.empty()) {
+            m_subword_callback("", chunk_tokens);
+        }


+    /// @brief Construct with a tokens-aware callback receiving both the decoded text chunk and the token IDs that produced it
+    TextStreamer(const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const ov::AnyMap& detokenization_params = {});
+


+        .def(py::init([](const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const std::map<std::string, py::object>& detokenization_params) {
+            return std::make_shared<TextStreamer>(tokenizer, callback, pyutils::properties_to_any_map(detokenization_params));
+        }),
+        py::arg("tokenizer"),
+        py::arg("callback"),
+        py::arg("detokenization_params") = ov::AnyMap({}))


init

8f23d24

Copilot AI review requested due to automatic review settings May 5, 2026 13:56

github-actions Bot added category: Python API Python API for GenAI category: CPP API Changes in GenAI C++ public headers category: text streamer labels May 5, 2026

Copilot started reviewing on behalf of mzegla May 5, 2026 13:57 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include tokens in text streamer callback#3802

Include tokens in text streamer callback#3802
mzegla wants to merge 1 commit into
openvinotoolkit:masterfrom
mzegla:stream_tokens

mzegla commented May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		/// @brief Construct with a tokens-aware callback receiving both the decoded text chunk and the token IDs that produced it
		TextStreamer(const Tokenizer& tokenizer, std::function<CallbackTypeVariant(std::string, std::vector<int64_t>)> callback, const ov::AnyMap& detokenization_params = {});

Conversation

mzegla commented May 5, 2026

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants