Add support for Chat response parsing by xenova · Pull Request #1639 · huggingface/transformers.js

xenova · 2026-04-10T19:22:29Z

more info: https://huggingface.co/docs/transformers/en/chat_response_parsing

transformers reference PR: huggingface/transformers#44674

Example usage:

import {
  AutoProcessor,
  Gemma4ForConditionalGeneration,
  TextStreamer,
} from "@huggingface/transformers";

// Load processor and model
const model_id = "onnx-community/gemma-4-E2B-it-ONNX";
const processor = await AutoProcessor.from_pretrained(model_id);
const model = await Gemma4ForConditionalGeneration.from_pretrained(model_id, {
  dtype: {
    audio_encoder: "q4f16",
    vision_encoder: "q4f16",
    embed_tokens: "q4f16",
    decoder_model_merged: "q4f16",
  },
  device: "webgpu",
});

// Define tools

const tools = [
  {
    type: "function",
    function: {
      name: "get_weather",
      description: "Get current weather information for a location",
      parameters: {
        type: "object",
        properties: {
          location: {
            type: "string",
            description: "The city and state, e.g. San Francisco, CA",
          },
          unit: {
            type: "string",
            enum: ["celsius", "fahrenheit"],
            description: "The unit of temperature to use",
          },
        },
        required: ["location"],
      },
    },
  },
];

// Prepare messages
const messages = [
  {
    role: "user",
    content: "What is the weather like in New York?",
  },
];
const prompt = processor.apply_chat_template(messages, {
  add_generation_prompt: true,
  tools,
});

// Prepare inputs
const inputs = await processor(prompt, null, null, {
  add_special_tokens: false,
});

// Helper to simulate tool execution
function executeTool(name, _args) {
  if (name === "get_weather") {
    // Simulate a weather API response
    return {
      temperature: 25,
      unit: "celsius",
      description: "Sunny with a few clouds",
    };
  }
  return { error: `Unknown tool: ${name}` };
}

// First generation: model should produce a tool call
console.log("=== First generation (expecting tool call) ===");
const outputs = await model.generate({
  ...inputs,
  max_new_tokens: 512,
  do_sample: false,
  streamer: new TextStreamer(processor.tokenizer, {
    skip_prompt: true,
    skip_special_tokens: false,
  }),
});

// Decode the first output to extract tool calls
const firstOutput = processor.batch_decode(
  outputs.slice(null, [inputs.input_ids.dims.at(-1), null]),
  { skip_special_tokens: false },
)[0];

const parsed = processor.parse_response(firstOutput);
console.log("\nParsed response:", JSON.stringify(parsed, null, 2));
const toolCalls = parsed.tool_calls ?? [];

if (toolCalls.length > 0) {
  // Execute tools and collect responses
  const toolResponses = toolCalls.map((tc) => ({
    name: tc.function.name,
    response: executeTool(tc.function.name, tc.function.arguments),
  }));
  console.log("Tool responses:", JSON.stringify(toolResponses, null, 2));

  // Build the full conversation with tool call + tool response
  messages.push({
    role: "assistant",
    tool_calls: toolCalls,
  });
  messages.push({
    role: "user",
    tool_responses: toolResponses,
  });

  // Re-apply chat template with the full conversation
  const prompt2 = processor.apply_chat_template(messages, {
    add_generation_prompt: true,
    tools,
  });

  const inputs2 = await processor(prompt2, null, null, {
    add_special_tokens: false,
  });

  // Second generation: model should produce a final answer
  console.log("\n=== Second generation (expecting final answer) ===");
  const outputs2 = await model.generate({
    ...inputs2,
    max_new_tokens: 512,
    do_sample: false,
    streamer: new TextStreamer(processor.tokenizer, {
      skip_prompt: true,
      skip_special_tokens: false,
    }),
  });

  const finalOutput = processor.batch_decode(
    outputs2.slice(null, [inputs2.input_ids.dims.at(-1), null]),
    { skip_special_tokens: true },
  )[0];
  console.log("\nFinal answer:", finalOutput);
}

@Rocketknight1 @nico-martin

HuggingFaceDocBuilderDev · 2026-04-12T16:03:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

xenova · 2026-04-12T16:47:32Z

I only implemented a minimal set of required jmespath functionality to not bloat the library... @Rocketknight1 lmk what kind of functionality you think it needed in addition to the current set (which basically only implements those outlined in your original tests).

Rocketknight1 · 2026-04-13T14:02:52Z

API looks good! One thing I'll say is that we probably won't have a perfectly clean implementation like we do with jinja, where we almost never need to extend the spec for new models. The "cascading regex plus some predefined parsers" approach works in most cases, but it's likely that future models will occasionally require us to add an extra custom parser because they have a very weird tool call format. In that case I'll try to remember to ping you on the Python PR so you can implement it here, but they shouldn't be long (Gemma4JsontoJson is an example of exactly this)

xenova · 2026-04-13T14:07:26Z

Thanks @Rocketknight1! Yeah that sounds good.

sroussey · 2026-04-14T00:26:04Z

This would be great. I wrote a bunch of trash just trying to figure out the different ways different models do things (deepseek 2 vs 3.1, vs llama's three ways, vs hermes/qwen, etc --- oh don't forget functiongema).

https://github.com/workglow-dev/workglow/blob/main/packages/ai-provider/src/provider-hf-transformers/common/HFT_ToolParser.ts

xenova · 2026-04-14T08:51:01Z

different models do things (deepseek 2 vs 3.1, vs llama's three ways, vs hermes/qwen, etc --- oh don't forget functiongema).

nico-martin · 2026-04-29T11:49:40Z

Gave it a little deep dive and I like the approach. Could we also add this to the TextGenerationPipeline so it would return a clean object?
Also I think it would make sense if we could parse streamed chunks. So if a request returns multiple toolcalls or toolcalls and extra text applications could already start to execute the tool while its still finishing the response.

Rocketknight1 · 2026-04-29T15:11:07Z

Yes, working on streaming right now! It may require some changes to the method, though

xenova · 2026-05-01T15:10:11Z

Yeah we are rethinking things a bit to ensure better streamed parsing. This current PR matches the existing transformers behaviour, but that might change soon. Will keep updated with @Rocketknight1 on slack 👍

xenova added 7 commits April 10, 2026 20:54

Create chat_parsing.test.js

6315706

Minimal jmespath implementation

35de5d0

Implement chat parsing

f6dbe12

Add parse_response to tokenizer and processor

1b7736a

cleanup

bed9ddc

fix jsdoc

811a524

update doc modules

a8d4e9c

xenova marked this pull request as draft May 8, 2026 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Chat response parsing#1639

Add support for Chat response parsing#1639
xenova wants to merge 7 commits into
mainfrom
chat-response-parsing

xenova commented Apr 10, 2026 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 12, 2026

Uh oh!

xenova commented Apr 12, 2026

Uh oh!

Rocketknight1 commented Apr 13, 2026

Uh oh!

xenova commented Apr 13, 2026

Uh oh!

sroussey commented Apr 14, 2026

Uh oh!

xenova commented Apr 14, 2026

Uh oh!

nico-martin commented Apr 29, 2026

Uh oh!

Rocketknight1 commented Apr 29, 2026

Uh oh!

xenova commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

xenova commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 12, 2026

Uh oh!

xenova commented Apr 12, 2026

Uh oh!

Rocketknight1 commented Apr 13, 2026

Uh oh!

xenova commented Apr 13, 2026

Uh oh!

sroussey commented Apr 14, 2026

Uh oh!

xenova commented Apr 14, 2026

Uh oh!

nico-martin commented Apr 29, 2026

Uh oh!

Rocketknight1 commented Apr 29, 2026

Uh oh!

xenova commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xenova commented Apr 10, 2026 •

edited

Loading