Skip to content

Releases: huggingface/transformers.js

4.2.0

23 Apr 12:49

Choose a tag to compare

🚀 Transformers.js v4.2 — Tool calling, simpler internals, and privacy filtering

  • Added tools to TextGenerationPipeline in #1655
  • Use inputMetadata API for simplified internals in #1657
  • Add support for OpenAI privacy filter model in #1658

Full Changelog: 4.1.0...4.2.0

4.1.0

23 Apr 12:49

Choose a tag to compare

🚀 Transformers.js v4.1 — Gemma 4, KV cache improvements, and new quantization dtypes

  • Add support for Gemma 4 in #1627
  • Cached generation improvements (+ past_key_values via pipeline function) in #1638
  • Improve tokenizer types based on input function parameters in #1641
  • Add support for q1, q1f16, q2, and q2f16 data types in #1647
  • Re-enable SmolVLM in #1648
  • Update default generation parameters in #1649
  • Pin GitHub Actions to commit SHAs in #1626

Full Changelog: 4.0.0...4.1.0

4.0.1

23 Apr 13:03
64da635

Choose a tag to compare

What's new?

Full Changelog: 4.0.0...4.0.1

4.0.0

30 Mar 12:55
364ebd4

Choose a tag to compare

🚀 Transformers.js v4

We're excited to announce that Transformers.js v4 is now available on NPM! After a year of development (we started in March 2025 🤯), we're finally ready for you to use it.

npm i @huggingface/transformers

Links: YouTube Video, Blog Post, Demo Collection

New WebGPU backend

The biggest change is undoubtedly the adoption of a new WebGPU Runtime, completely rewritten in C++. We've worked closely with the ONNX Runtime team to thoroughly test this runtime across our ~200 supported model architectures, as well as many new v4-exclusive architectures.

In addition to better operator support (for performance, accuracy, and coverage), this new WebGPU runtime allows the same transformers.js code to be used across a wide variety of JavaScript environments, including browsers, server-side runtimes, and desktop applications. That's right, you can now run WebGPU-accelerated models directly in Node, Bun, and Deno!

WebGPU Overview

We've proven that it's possible to run state-of-the-art AI models 100% locally in the browser, and now we're focused on performance: making these models run as fast as possible, even in resource-constrained environments. This required completely rethinking our export strategy, especially for large language models. We achieve this by re-implementing new models operation by operation, leveraging specialized ONNX Runtime Contrib Operators like com.microsoft.GroupQueryAttention, com.microsoft.MatMulNBits, or com.microsoft.QMoE to maximize performance.

For example, adopting the com.microsoft.MultiHeadAttention operator, we were able to achieve a ~4x speedup for BERT-based embedding models.

Optimized ONNX Exports

  • ONNX Runtime improvements by @xenova in #1306
  • Transformers.js V4: Native WebGPU EP, repo restructuring, and more! by @xenova in #1382

New models

Thanks to our new export strategy and ONNX Runtime's expanding support for custom operators, we've been able to add many new models and architectures to Transformers.js v4. These include popular models like GPT-OSS, Chatterbox, GraniteMoeHybrid, LFM2-MoE, HunYuanDenseV1, Apertus, Olmo3, FalconH1, and Youtu-LLM. Many of these required us to implement support for advanced architectural patterns, including Mamba (state-space models), Multi-head Latent Attention (MLA), and Mixture of Experts (MoE). Perhaps most importantly, these models are all compatible with WebGPU, allowing users to run them directly in the browser or server-side JavaScript environments with hardware acceleration. We've released several Transformers.js v4 demos so far... and we'll continue to release more!

Additionally, we've added support for larger models exceeding 8B parameters. In our tests, we've been able to run GPT-OSS 20B (q4f16) at ~60 tokens per second on an M4 Pro Max.

New features

ModelRegistry

The new ModelRegistry API is designed for production workflows. It provides explicit visibility into pipeline assets before loading anything: list required files with get_pipeline_files, inspect per-file metadata with get_file_metadata (quite useful to calculate total download size), check cache status with is_pipeline_cached, and clear cached artifacts with clear_pipeline_cache. You can also query available precision types for a model with get_available_dtypes. Based on this new API, progress_callback now includes a progress_total event, making it easy to render end-to-end loading progress without manually aggregating per-file updates.

See `ModelRegistry` examples
import { ModelRegistry, pipeline } from "@huggingface/transformers";

const modelId = "onnx-community/all-MiniLM-L6-v2-ONNX";
const modelOptions = { dtype: "fp32" };

const files = await ModelRegistry.get_pipeline_files(
  "feature-extraction",
  modelId,
  modelOptions
);
// ['config.json', 'onnx/model.onnx', ..., 'tokenizer_config.json']

const metadata = await Promise.all(
  files.map(file => ModelRegistry.get_file_metadata(modelId, file))
);

const downloadSize = metadata.reduce((total, item) => total + item.size, 0);

const cached = await ModelRegistry.is_pipeline_cached(
  "feature-extraction",
  modelId,
  modelOptions
);

const dtypes = await ModelRegistry.get_available_dtypes(modelId);
// ['fp32', 'fp16', 'q4', 'q4f16']

if (cached) {
  await ModelRegistry.clear_pipeline_cache(
    "feature-extraction",
    modelId,
    modelOptions
  );
}

const pipe = await pipeline(
  "feature-extraction",
  modelId,
  {
    progress_callback: e => {
      if (e.status === "progress_total") {
        console.log(`${Math.round(e.progress)}%`);
      }
    },
  }
);

New Environment Settings

We also added new environment controls for model loading. env.useWasmCache enables caching of WASM runtime files (when cache storage is available), allowing applications to work fully offline after the initial load.

env.fetch lets you provide a custom fetch implementation for use cases such as authenticated model access, custom headers, and abortable requests.

See env examples
import { env } from "@huggingface/transformers";

env.useWasmCache = true;

env.fetch = (url, options) =>
  fetch(url, {
    ...options,
    headers: {
      ...options?.headers,
      Authorization: `Bearer ${MY_TOKEN}`,
    },
  });

Improved Logging Controls

Finally, logging is easier to manage in real-world deployments. ONNX Runtime WebGPU warnings are now hidden by default, and you can set explicit verbosity levels for both Transformers.js and ONNX Runtime. This update, also driven by community feedback, keeps console output focused on actionable signals rather than low-value noise.

See `logLevel` example
import { env, LogLevel } from "@huggingface/transformers";

// LogLevel.DEBUG
// LogLevel.INFO
// LogLevel.WARNING
// LogLevel.ERROR
// LogLevel.NONE

env.logLevel = LogLevel.WARNING;
Read more

3.8.1

02 Dec 14:37
2ec882e

Choose a tag to compare

What's new?

  • Add support for Ministral 3 in #1474
  • Fix Ernie 4.5 naming in #1473
  • Update Supertonic TTS paper + authors in #1463

Full Changelog: 3.8.0...3.8.1

3.8.0

19 Nov 16:50
bf09aaf

Choose a tag to compare

🚀 Transformers.js v3.8 — SAM2, SAM3, EdgeTAM, Supertonic TTS

  • Add support for EdgeTAM in #1454

  • Add support for Supertonic TTS in #1459

    Example:

    import { pipeline } from '@huggingface/transformers';
    
    const tts = await pipeline('text-to-speech', 'onnx-community/Supertonic-TTS-ONNX');
    
    const input_text = 'This is really cool!';
    const audio = await tts(input_text, {
        speaker_embeddings: 'https://huggingface.co/onnx-community/Supertonic-TTS-ONNX/resolve/main/voices/F1.bin',
    });
    await audio.save('output.wav');
  • Add support for SAM2 and SAM3 (Tracker) in #1461

  • Remove Metaspace add_prefix_space logic in #1451

  • ImageProcessor preprocess uses image_std for fill value by @NathanKolbas in #1455

New Contributors

Full Changelog: 3.7.6...3.8.0

3.7.6

20 Oct 19:44
4c908ec

Choose a tag to compare

What's new?

New Contributors

Full Changelog: 3.7.5...3.7.6

3.7.5

02 Oct 13:58
c670bb9

Choose a tag to compare

What's new?

  • Add support for GraniteMoeHybrid in #1426

Full Changelog: 3.7.4...3.7.5

3.7.4

29 Sep 17:40
d6b3998

Choose a tag to compare

What's new?

  • Correctly assign logits warpers in _get_logits_processor in #1422

Full Changelog: 3.7.3...3.7.4

3.7.3

12 Sep 20:35
699dcb5

Choose a tag to compare

What's new?

  • Unify inference chains in #1399
  • Fix progress tracking bug by @kukudixiaoming in #1405
  • Add support for MobileLLM-R1 (llama4_text) in #1412
  • Add support for VaultGemma in #1413

New Contributors

Full Changelog: 3.7.2...3.7.3