Skip to content

Release v3.2.0#1294

Merged
jlarson4 merged 16 commits intomainfrom
dev
May 8, 2026
Merged

Release v3.2.0#1294
jlarson4 merged 16 commits intomainfrom
dev

Conversation

@jlarson4
Copy link
Copy Markdown
Collaborator

@jlarson4 jlarson4 commented May 8, 2026

Description

Added mT5 support, improved Quantization support, and several updates related to older issues. Check the release log for 3.2 for a full suite of details

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

brendanlong and others added 16 commits May 4, 2026 10:26
* Fix type of HookedTransformerConfig.device

This is typed as `Optional[str]` but sometimes returns `torch.device`.
Updated the code to just return the `str` instead of wrapping with a
device.

I'm not confident that every function which takes a device will
always be passed a string, so I didn't change functions like
warn_if_mps.

Found while working on #1219

* more cleanup

* 3.0 CI Bugs (#1261)

* Fixing `utils` imports

* skip gated notebooks on PR from forks

* Updating notebooks

* Ensure LLaMA only runs when HF_TOKEN is available

---------

Co-authored-by: jlarson4 <jonahalarson@comcast.net>
* Fix tokenizer-free generate() and add regression tests

* Drop accidental tokenize_utils.py changes (superseded by #1273 on dev)

* mypy and make check-format

---------

Co-authored-by: jlarson4 <jonahalarson@comcast.net>
…el is passed. This avoids unnecessary networks calls. (#1279)
* Fix: IOIDataset generates diverse samples (Fixes #515)

random.seed(42) was called at the top of get_sample() on every
invocation, resetting the RNG before each draw. As a result, all
samples in the dataset were identical regardless of num_samples.

Changes:
- Remove random.seed(42) from get_sample()
- Add optional seed parameter to __init__() for reproducible datasets
  (seed=None by default — fully backward compatible)
- Add docstring to __init__()
- Add 5 unit tests covering diversity, reproducibility, and edge cases

Fixes #515

* Fix CI: black formatting + seed the IOIDataset doctest

- Apply black formatting to test_evals_ioi.py
- Update IOIDataset doctest to use seed=42 for deterministic results.
  The doctest previously passed because all samples were identical
  (the bug being fixed); now that samples are diverse, the result
  depends on RNG state, so we seed for reproducibility.

---------

Co-authored-by: Divij Chawla <divijchawla7@users.noreply.github.com>
#1283)

* Fix: preserve tokenizer.padding_side when reloading with add_bos_token

When get_tokenizer_with_bos() reloads a tokenizer via
AutoTokenizer.from_pretrained(), HuggingFace silently resets padding_side
to its default (usually 'right'). This caused user-set padding_side='left'
to be discarded when the tokenizer was passed into HookedTransformer,
affecting Gemma, Falcon, and other decoder models that need left padding
for batched generation.

Fix: copy padding_side from the original tokenizer to the reloaded one.

Add 3 regression tests covering left, right, and the no-reload path.

Fixes #801

* Fix CI: revert unrelated black 23 reformat noise

* Re-trigger CI (Othello flake)

---------

Co-authored-by: Divij Chawla <divijchawla7@users.noreply.github.com>
* Fix type of HookedTransformerConfig.device (#1230)

* Fix type of HookedTransformerConfig.device

This is typed as `Optional[str]` but sometimes returns `torch.device`.
Updated the code to just return the `str` instead of wrapping with a
device.

I'm not confident that every function which takes a device will
always be passed a string, so I didn't change functions like
warn_if_mps.

Found while working on #1219

* more cleanup

* 3.0 CI Bugs (#1261)

* Fixing `utils` imports

* skip gated notebooks on PR from forks

* Updating notebooks

* Ensure LLaMA only runs when HF_TOKEN is available

---------

Co-authored-by: jlarson4 <jonahalarson@comcast.net>

* feat: Add MPS CI runner support (#1264)

* ci: Enable runs on feature branch

* fix: Skip heavy model_bridge unit tests on MPS runner due to memory limits

* fix: Ignore flaky grouped query attention tests on Mac runner

* style: Run make format on test_mps_basic.py

* style: Standardize MPS step naming convention in CI

* ci: Revert MPS trigger to run only on main PRs and pushes

* ci: Remove feature branch from global workflow triggers

* fix: Restore torch.device return type in get_device for API stability

* docs: Align train and device config docstrings with implementation

* fix: Update device type hints to Union[str, torch.device] for consistency

* ci: Update mps-checks trigger to include dev branch

* test: Update device tests for torch.device compatibility and robustness

* ci: Restrict MPS trigger to main branch only

* ci: Pass HF_TOKEN to MPS check jobs

* fix: Revert device type hints and tests to strictly use strings

* ci: Refine MPS test coverage

* cleanup: Remove unused no_mps marker

* fix: Remove unused Union import in lit/model.py

* ci: Temporarily enable dev trigger for verification

* ci: Re-ignore tests that fail with NaNs or precision errors on MPS

* ci: Restrict MPS trigger to main branch

* fix: Use torch.allclose for GQA tests to allow MPS float precision delta

* ci: Enable MPS checks on dev branches

* style: Run make format on test_grouped_query_attention.py

* ci: Revert MPS trigger to run only on main PRs and pushes

---------

Co-authored-by: Brendan Long <self@brendanlong.com>
Co-authored-by: jlarson4 <jonahalarson@comcast.net>
* Docs: clarify n_params excludes embeddings/biases/layer norms

The previous docstring noted 'Non embedding parameters' and 'Ignoring
biases and layer norms' but users still computed memory footprints from
n_params and got confused (see #448).

Make the exclusions unambiguous and explicitly point to
`sum(p.numel() for p in model.parameters())` for the total parameter
count. No API change.

Fixes #448

* Add n_params_total property for total parameter count

Per #448 (Neel Nanda): users want a parameter count that includes
embeddings/biases, matching HF's model.num_parameters() and the Pythia
reporting convention. cfg.n_params only counts 'hidden weight' params
(scaling-laws convention) which is more useful for predicting performance
but confusing for memory-budget calculations.

Adds an additive HookedTransformer.n_params_total property that returns
sum(p.numel() for p in self.parameters()). Existing cfg.n_params behavior
is unchanged (no API break).

The accompanying docstring update on cfg.n_params makes the distinction
explicit and points users to the new property.

4 unit tests covering attn-only, MLP, equivalence with sum(p.numel()), and return type.

* Fix CI: remove unused torch import flagged by pycln

* Fix CI: don't pass act_fn=None (beartype rejects)

beartype is enabled in CI and validates type hints at runtime.
HookedTransformerConfig declares act_fn: str (not Optional[str]),
so passing None — even when attn_only=True — fails the type check.

Build kwargs dict and only include act_fn/d_mlp when needed.

* Address review feedback (#1284)

- Replace tautological test (sum(p.numel()) == sum(p.numel())) with hand-
  computed expected values for two small fixtures (attn-only and with MLP).
  Test docstrings show the breakdown so reviewers can verify by inspection.

- Add end-to-end test on real loaded GPT-2 (cached by CI). Asserts the
  property reflects actual model.parameters() and that the count is in
  the GPT-2 band. Note: a strict HF parity check fails because TL stores
  W_E and W_U as separate Parameters while HF GPT-2 ties lm_head — the
  expected delta is d_vocab * d_model. Documented inline.

- Mirror n_params_total to TransformerBridge for consistency.

---------

Co-authored-by: Divij Chawla <divijchawla7@users.noreply.github.com>
* Updating tokenizer information on TransformerBridge table to be more detailed

* Updating docs build watcher to properly hot-reload for model table changes

* Update Interactive Model table to properly load information again

* Improved docstring to wrap up #99

* Updated `prepend_bos` comments to resolve #100

* Added tokenization tests for the Bridge
* Updating tokenizer information on TransformerBridge table to be more detailed

* Updating docs build watcher to properly hot-reload for model table changes

* Update Interactive Model table to properly load information again

* Improved docstring to wrap up #99

* Updated `prepend_bos` comments to resolve #100

* Added tokenization tests for the Bridge

* Added new demo to show how to run lm-eval-harness with TransformerBridge

* stripped stale output

* Add optional evals dependencies

* Fixing notebook checks for eval

* Clean up issues with demo
* Updating tokenizer information on TransformerBridge table to be more detailed

* Updating docs build watcher to properly hot-reload for model table changes

* Update Interactive Model table to properly load information again

* Improved docstring to wrap up #99

* Updated `prepend_bos` comments to resolve #100

* Added tokenization tests for the Bridge

* Added new demo to show how to run lm-eval-harness with TransformerBridge

* stripped stale output

* Add optional evals dependencies

* forwarding the details of PR #473 to our modern TransformerBridge system

* Fixing notebook checks for eval

* Clean up issues with demo

* Improve typing
* Add testing and documentation for GatedMLP

* Added BERT tests and additional demo blocks
* Add testing and documentation for GatedMLP

* Added BERT tests and additional demo blocks

* Added support for mt5, added a fix for handling T5 models on Transformers v5
* Add testing and documentation for GatedMLP

* Added BERT tests and additional demo blocks

* Added support for mt5, added a fix for handling T5 models on Transformers v5

* Verifying support for SimpleStories

* Added additional model support
* Add testing and documentation for GatedMLP

* Added BERT tests and additional demo blocks

* Added support for mt5, added a fix for handling T5 models on Transformers v5

* Verifying support for SimpleStories

* Added additional model support

* Updating verification system to allow all canonical authored models into the system

* Add additional canonical authors

* Improved quantization skipping and messages

* Updated scrape, fixes to prevent timeout and loss of Gaps details

* updated model_properties_table
@jlarson4 jlarson4 merged commit 31d4f6a into main May 8, 2026
91 of 96 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants