fix: handle NoneType check for think tokens in TokenizerWrapper#1167
Merged
angeloskath merged 4 commits intoml-explore:mainfrom Apr 21, 2026
Merged
Conversation
- Added defensive null checks using getattr() for think_start_id and think_end_id to prevent TypeError on models without thinking tokens. - Preserved original validation logic for token sequences longer than 1. - Added a comprehensive test case in tests/test_tokenizers.py to verify safe access on non-thinking models (e.g., Llama 3.2). - Verified with full test suite (191 passed) on Apple Silicon.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Problem
When using models that do not explicitly define thinking tokens in their configuration (for example, Qwen2.5-Coder or Llama 3.2), accessing
tokenizer.think_start_idortokenizer.think_end_idtriggers aTypeError: object of type 'NoneType' has no len().This occurs because the current implementation assumes
self._think_start_tokensis always a list, even when it hasn't been initialized for non-thinking models. This crash is particularly disruptive for downstream tools like Aider or custom inference schedulers that attempt to detect thinking capabilities via these properties.The Fix
getattr()andif notlogic in boththink_start_idandthink_end_id.Noneor missing, the properties now gracefully returnNoneinstead of crashing.ValueErrorfor token sequences longer than 1 remains intact to ensure backward compatibility and logical consistency for models that do support thinking tokens.Testing
Automated Tests
test_no_think_tokens_safe_accessintests/test_tokenizers.pywhich:has_thinkingisFalsefor non-thinking models.think_start_idandthink_end_idreturnNonesafely.TypeErrororAttributeErroris raised.pytest testson Apple M3 Max (macOS 15, Python 3.12.8).191 passed, 1 skipped, 2 warnings(Warnings are unrelated SWIG/Python 3.12 deprecations).Manual Traceback Verification (Before Fix)
Checklist