Release v3.2.0 by jlarson4 · Pull Request #1294 · TransformerLensOrg/TransformerLens

jlarson4 · 2026-05-08T20:38:29Z

Description

Added mT5 support, improved Quantization support, and several updates related to older issues. Check the release log for 3.2 for a full suite of details

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
This change requires a documentation update

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

* Fix type of HookedTransformerConfig.device This is typed as `Optional[str]` but sometimes returns `torch.device`. Updated the code to just return the `str` instead of wrapping with a device. I'm not confident that every function which takes a device will always be passed a string, so I didn't change functions like warn_if_mps. Found while working on #1219 * more cleanup * 3.0 CI Bugs (#1261) * Fixing `utils` imports * skip gated notebooks on PR from forks * Updating notebooks * Ensure LLaMA only runs when HF_TOKEN is available --------- Co-authored-by: jlarson4 <jonahalarson@comcast.net>

* Fix tokenizer-free generate() and add regression tests * Drop accidental tokenize_utils.py changes (superseded by #1273 on dev) * mypy and make check-format --------- Co-authored-by: jlarson4 <jonahalarson@comcast.net>

…el is passed. This avoids unnecessary networks calls. (#1279)

…ure (#1281)

* Fix: IOIDataset generates diverse samples (Fixes #515) random.seed(42) was called at the top of get_sample() on every invocation, resetting the RNG before each draw. As a result, all samples in the dataset were identical regardless of num_samples. Changes: - Remove random.seed(42) from get_sample() - Add optional seed parameter to __init__() for reproducible datasets (seed=None by default — fully backward compatible) - Add docstring to __init__() - Add 5 unit tests covering diversity, reproducibility, and edge cases Fixes #515 * Fix CI: black formatting + seed the IOIDataset doctest - Apply black formatting to test_evals_ioi.py - Update IOIDataset doctest to use seed=42 for deterministic results. The doctest previously passed because all samples were identical (the bug being fixed); now that samples are diverse, the result depends on RNG state, so we seed for reproducibility. --------- Co-authored-by: Divij Chawla <divijchawla7@users.noreply.github.com>

#1283) * Fix: preserve tokenizer.padding_side when reloading with add_bos_token When get_tokenizer_with_bos() reloads a tokenizer via AutoTokenizer.from_pretrained(), HuggingFace silently resets padding_side to its default (usually 'right'). This caused user-set padding_side='left' to be discarded when the tokenizer was passed into HookedTransformer, affecting Gemma, Falcon, and other decoder models that need left padding for batched generation. Fix: copy padding_side from the original tokenizer to the reloaded one. Add 3 regression tests covering left, right, and the no-reload path. Fixes #801 * Fix CI: revert unrelated black 23 reformat noise * Re-trigger CI (Othello flake) --------- Co-authored-by: Divij Chawla <divijchawla7@users.noreply.github.com>

* Fix type of HookedTransformerConfig.device (#1230) * Fix type of HookedTransformerConfig.device This is typed as `Optional[str]` but sometimes returns `torch.device`. Updated the code to just return the `str` instead of wrapping with a device. I'm not confident that every function which takes a device will always be passed a string, so I didn't change functions like warn_if_mps. Found while working on #1219 * more cleanup * 3.0 CI Bugs (#1261) * Fixing `utils` imports * skip gated notebooks on PR from forks * Updating notebooks * Ensure LLaMA only runs when HF_TOKEN is available --------- Co-authored-by: jlarson4 <jonahalarson@comcast.net> * feat: Add MPS CI runner support (#1264) * ci: Enable runs on feature branch * fix: Skip heavy model_bridge unit tests on MPS runner due to memory limits * fix: Ignore flaky grouped query attention tests on Mac runner * style: Run make format on test_mps_basic.py * style: Standardize MPS step naming convention in CI * ci: Revert MPS trigger to run only on main PRs and pushes * ci: Remove feature branch from global workflow triggers * fix: Restore torch.device return type in get_device for API stability * docs: Align train and device config docstrings with implementation * fix: Update device type hints to Union[str, torch.device] for consistency * ci: Update mps-checks trigger to include dev branch * test: Update device tests for torch.device compatibility and robustness * ci: Restrict MPS trigger to main branch only * ci: Pass HF_TOKEN to MPS check jobs * fix: Revert device type hints and tests to strictly use strings * ci: Refine MPS test coverage * cleanup: Remove unused no_mps marker * fix: Remove unused Union import in lit/model.py * ci: Temporarily enable dev trigger for verification * ci: Re-ignore tests that fail with NaNs or precision errors on MPS * ci: Restrict MPS trigger to main branch * fix: Use torch.allclose for GQA tests to allow MPS float precision delta * ci: Enable MPS checks on dev branches * style: Run make format on test_grouped_query_attention.py * ci: Revert MPS trigger to run only on main PRs and pushes --------- Co-authored-by: Brendan Long <self@brendanlong.com> Co-authored-by: jlarson4 <jonahalarson@comcast.net>

* Docs: clarify n_params excludes embeddings/biases/layer norms The previous docstring noted 'Non embedding parameters' and 'Ignoring biases and layer norms' but users still computed memory footprints from n_params and got confused (see #448). Make the exclusions unambiguous and explicitly point to `sum(p.numel() for p in model.parameters())` for the total parameter count. No API change. Fixes #448 * Add n_params_total property for total parameter count Per #448 (Neel Nanda): users want a parameter count that includes embeddings/biases, matching HF's model.num_parameters() and the Pythia reporting convention. cfg.n_params only counts 'hidden weight' params (scaling-laws convention) which is more useful for predicting performance but confusing for memory-budget calculations. Adds an additive HookedTransformer.n_params_total property that returns sum(p.numel() for p in self.parameters()). Existing cfg.n_params behavior is unchanged (no API break). The accompanying docstring update on cfg.n_params makes the distinction explicit and points users to the new property. 4 unit tests covering attn-only, MLP, equivalence with sum(p.numel()), and return type. * Fix CI: remove unused torch import flagged by pycln * Fix CI: don't pass act_fn=None (beartype rejects) beartype is enabled in CI and validates type hints at runtime. HookedTransformerConfig declares act_fn: str (not Optional[str]), so passing None — even when attn_only=True — fails the type check. Build kwargs dict and only include act_fn/d_mlp when needed. * Address review feedback (#1284) - Replace tautological test (sum(p.numel()) == sum(p.numel())) with hand- computed expected values for two small fixtures (attn-only and with MLP). Test docstrings show the breakdown so reviewers can verify by inspection. - Add end-to-end test on real loaded GPT-2 (cached by CI). Asserts the property reflects actual model.parameters() and that the count is in the GPT-2 band. Note: a strict HF parity check fails because TL stores W_E and W_U as separate Parameters while HF GPT-2 ties lm_head — the expected delta is d_vocab * d_model. Documented inline. - Mirror n_params_total to TransformerBridge for consistency. --------- Co-authored-by: Divij Chawla <divijchawla7@users.noreply.github.com>

* Updating tokenizer information on TransformerBridge table to be more detailed * Updating docs build watcher to properly hot-reload for model table changes * Update Interactive Model table to properly load information again * Improved docstring to wrap up #99 * Updated `prepend_bos` comments to resolve #100 * Added tokenization tests for the Bridge

* Updating tokenizer information on TransformerBridge table to be more detailed * Updating docs build watcher to properly hot-reload for model table changes * Update Interactive Model table to properly load information again * Improved docstring to wrap up #99 * Updated `prepend_bos` comments to resolve #100 * Added tokenization tests for the Bridge * Added new demo to show how to run lm-eval-harness with TransformerBridge * stripped stale output * Add optional evals dependencies * Fixing notebook checks for eval * Clean up issues with demo

* Updating tokenizer information on TransformerBridge table to be more detailed * Updating docs build watcher to properly hot-reload for model table changes * Update Interactive Model table to properly load information again * Improved docstring to wrap up #99 * Updated `prepend_bos` comments to resolve #100 * Added tokenization tests for the Bridge * Added new demo to show how to run lm-eval-harness with TransformerBridge * stripped stale output * Add optional evals dependencies * forwarding the details of PR #473 to our modern TransformerBridge system * Fixing notebook checks for eval * Clean up issues with demo * Improve typing

* Add testing and documentation for GatedMLP * Added BERT tests and additional demo blocks

* Add testing and documentation for GatedMLP * Added BERT tests and additional demo blocks * Added support for mt5, added a fix for handling T5 models on Transformers v5

* Add testing and documentation for GatedMLP * Added BERT tests and additional demo blocks * Added support for mt5, added a fix for handling T5 models on Transformers v5 * Verifying support for SimpleStories * Added additional model support

* Add testing and documentation for GatedMLP * Added BERT tests and additional demo blocks * Added support for mt5, added a fix for handling T5 models on Transformers v5 * Verifying support for SimpleStories * Added additional model support * Updating verification system to allow all canonical authored models into the system * Add additional canonical authors * Improved quantization skipping and messages * Updated scrape, fixes to prevent timeout and loss of Gaps details * updated model_properties_table

brendanlong and others added 16 commits May 4, 2026 10:26

Resolved duplicate gemma code issue

241053f

Updated boot_transformers to use local hf_config, if a local hf_mod…

0636214

…el is passed. This avoids unnecessary networks calls. (#1279)

Add some defensive testing to ensure #904 does not reoccur in the fut…

a22d775

…ure (#1281)

Resolution to Issues #477 and #264 (#1288)

607a627

* Add testing and documentation for GatedMLP * Added BERT tests and additional demo blocks

mT5 Support (#1289)

d5e3a2b

* Add testing and documentation for GatedMLP * Added BERT tests and additional demo blocks * Added support for mt5, added a fix for handling T5 models on Transformers v5

SimpleStories Model verification (#1292)

0c0bd3c

* Add testing and documentation for GatedMLP * Added BERT tests and additional demo blocks * Added support for mt5, added a fix for handling T5 models on Transformers v5 * Verifying support for SimpleStories * Added additional model support

jlarson4 merged commit 31d4f6a into main May 8, 2026
91 of 96 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v3.2.0#1294

Release v3.2.0#1294
jlarson4 merged 16 commits intomainfrom
dev

jlarson4 commented May 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jlarson4 commented May 8, 2026

Description

Type of change

Checklist:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants