Conversation
* Fix type of HookedTransformerConfig.device This is typed as `Optional[str]` but sometimes returns `torch.device`. Updated the code to just return the `str` instead of wrapping with a device. I'm not confident that every function which takes a device will always be passed a string, so I didn't change functions like warn_if_mps. Found while working on #1219 * more cleanup * 3.0 CI Bugs (#1261) * Fixing `utils` imports * skip gated notebooks on PR from forks * Updating notebooks * Ensure LLaMA only runs when HF_TOKEN is available --------- Co-authored-by: jlarson4 <jonahalarson@comcast.net>
* Fix tokenizer-free generate() and add regression tests * Drop accidental tokenize_utils.py changes (superseded by #1273 on dev) * mypy and make check-format --------- Co-authored-by: jlarson4 <jonahalarson@comcast.net>
…el is passed. This avoids unnecessary networks calls. (#1279)
* Fix: IOIDataset generates diverse samples (Fixes #515) random.seed(42) was called at the top of get_sample() on every invocation, resetting the RNG before each draw. As a result, all samples in the dataset were identical regardless of num_samples. Changes: - Remove random.seed(42) from get_sample() - Add optional seed parameter to __init__() for reproducible datasets (seed=None by default — fully backward compatible) - Add docstring to __init__() - Add 5 unit tests covering diversity, reproducibility, and edge cases Fixes #515 * Fix CI: black formatting + seed the IOIDataset doctest - Apply black formatting to test_evals_ioi.py - Update IOIDataset doctest to use seed=42 for deterministic results. The doctest previously passed because all samples were identical (the bug being fixed); now that samples are diverse, the result depends on RNG state, so we seed for reproducibility. --------- Co-authored-by: Divij Chawla <divijchawla7@users.noreply.github.com>
#1283) * Fix: preserve tokenizer.padding_side when reloading with add_bos_token When get_tokenizer_with_bos() reloads a tokenizer via AutoTokenizer.from_pretrained(), HuggingFace silently resets padding_side to its default (usually 'right'). This caused user-set padding_side='left' to be discarded when the tokenizer was passed into HookedTransformer, affecting Gemma, Falcon, and other decoder models that need left padding for batched generation. Fix: copy padding_side from the original tokenizer to the reloaded one. Add 3 regression tests covering left, right, and the no-reload path. Fixes #801 * Fix CI: revert unrelated black 23 reformat noise * Re-trigger CI (Othello flake) --------- Co-authored-by: Divij Chawla <divijchawla7@users.noreply.github.com>
* Fix type of HookedTransformerConfig.device (#1230) * Fix type of HookedTransformerConfig.device This is typed as `Optional[str]` but sometimes returns `torch.device`. Updated the code to just return the `str` instead of wrapping with a device. I'm not confident that every function which takes a device will always be passed a string, so I didn't change functions like warn_if_mps. Found while working on #1219 * more cleanup * 3.0 CI Bugs (#1261) * Fixing `utils` imports * skip gated notebooks on PR from forks * Updating notebooks * Ensure LLaMA only runs when HF_TOKEN is available --------- Co-authored-by: jlarson4 <jonahalarson@comcast.net> * feat: Add MPS CI runner support (#1264) * ci: Enable runs on feature branch * fix: Skip heavy model_bridge unit tests on MPS runner due to memory limits * fix: Ignore flaky grouped query attention tests on Mac runner * style: Run make format on test_mps_basic.py * style: Standardize MPS step naming convention in CI * ci: Revert MPS trigger to run only on main PRs and pushes * ci: Remove feature branch from global workflow triggers * fix: Restore torch.device return type in get_device for API stability * docs: Align train and device config docstrings with implementation * fix: Update device type hints to Union[str, torch.device] for consistency * ci: Update mps-checks trigger to include dev branch * test: Update device tests for torch.device compatibility and robustness * ci: Restrict MPS trigger to main branch only * ci: Pass HF_TOKEN to MPS check jobs * fix: Revert device type hints and tests to strictly use strings * ci: Refine MPS test coverage * cleanup: Remove unused no_mps marker * fix: Remove unused Union import in lit/model.py * ci: Temporarily enable dev trigger for verification * ci: Re-ignore tests that fail with NaNs or precision errors on MPS * ci: Restrict MPS trigger to main branch * fix: Use torch.allclose for GQA tests to allow MPS float precision delta * ci: Enable MPS checks on dev branches * style: Run make format on test_grouped_query_attention.py * ci: Revert MPS trigger to run only on main PRs and pushes --------- Co-authored-by: Brendan Long <self@brendanlong.com> Co-authored-by: jlarson4 <jonahalarson@comcast.net>
* Docs: clarify n_params excludes embeddings/biases/layer norms The previous docstring noted 'Non embedding parameters' and 'Ignoring biases and layer norms' but users still computed memory footprints from n_params and got confused (see #448). Make the exclusions unambiguous and explicitly point to `sum(p.numel() for p in model.parameters())` for the total parameter count. No API change. Fixes #448 * Add n_params_total property for total parameter count Per #448 (Neel Nanda): users want a parameter count that includes embeddings/biases, matching HF's model.num_parameters() and the Pythia reporting convention. cfg.n_params only counts 'hidden weight' params (scaling-laws convention) which is more useful for predicting performance but confusing for memory-budget calculations. Adds an additive HookedTransformer.n_params_total property that returns sum(p.numel() for p in self.parameters()). Existing cfg.n_params behavior is unchanged (no API break). The accompanying docstring update on cfg.n_params makes the distinction explicit and points users to the new property. 4 unit tests covering attn-only, MLP, equivalence with sum(p.numel()), and return type. * Fix CI: remove unused torch import flagged by pycln * Fix CI: don't pass act_fn=None (beartype rejects) beartype is enabled in CI and validates type hints at runtime. HookedTransformerConfig declares act_fn: str (not Optional[str]), so passing None — even when attn_only=True — fails the type check. Build kwargs dict and only include act_fn/d_mlp when needed. * Address review feedback (#1284) - Replace tautological test (sum(p.numel()) == sum(p.numel())) with hand- computed expected values for two small fixtures (attn-only and with MLP). Test docstrings show the breakdown so reviewers can verify by inspection. - Add end-to-end test on real loaded GPT-2 (cached by CI). Asserts the property reflects actual model.parameters() and that the count is in the GPT-2 band. Note: a strict HF parity check fails because TL stores W_E and W_U as separate Parameters while HF GPT-2 ties lm_head — the expected delta is d_vocab * d_model. Documented inline. - Mirror n_params_total to TransformerBridge for consistency. --------- Co-authored-by: Divij Chawla <divijchawla7@users.noreply.github.com>
* Updating tokenizer information on TransformerBridge table to be more detailed * Updating docs build watcher to properly hot-reload for model table changes * Update Interactive Model table to properly load information again * Improved docstring to wrap up #99 * Updated `prepend_bos` comments to resolve #100 * Added tokenization tests for the Bridge
* Updating tokenizer information on TransformerBridge table to be more detailed * Updating docs build watcher to properly hot-reload for model table changes * Update Interactive Model table to properly load information again * Improved docstring to wrap up #99 * Updated `prepend_bos` comments to resolve #100 * Added tokenization tests for the Bridge * Added new demo to show how to run lm-eval-harness with TransformerBridge * stripped stale output * Add optional evals dependencies * Fixing notebook checks for eval * Clean up issues with demo
* Updating tokenizer information on TransformerBridge table to be more detailed * Updating docs build watcher to properly hot-reload for model table changes * Update Interactive Model table to properly load information again * Improved docstring to wrap up #99 * Updated `prepend_bos` comments to resolve #100 * Added tokenization tests for the Bridge * Added new demo to show how to run lm-eval-harness with TransformerBridge * stripped stale output * Add optional evals dependencies * forwarding the details of PR #473 to our modern TransformerBridge system * Fixing notebook checks for eval * Clean up issues with demo * Improve typing
* Add testing and documentation for GatedMLP * Added BERT tests and additional demo blocks * Added support for mt5, added a fix for handling T5 models on Transformers v5
* Add testing and documentation for GatedMLP * Added BERT tests and additional demo blocks * Added support for mt5, added a fix for handling T5 models on Transformers v5 * Verifying support for SimpleStories * Added additional model support
* Add testing and documentation for GatedMLP * Added BERT tests and additional demo blocks * Added support for mt5, added a fix for handling T5 models on Transformers v5 * Verifying support for SimpleStories * Added additional model support * Updating verification system to allow all canonical authored models into the system * Add additional canonical authors * Improved quantization skipping and messages * Updated scrape, fixes to prevent timeout and loss of Gaps details * updated model_properties_table
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Added mT5 support, improved Quantization support, and several updates related to older issues. Check the release log for 3.2 for a full suite of details
Type of change
Checklist: