Skip to content

fix: handle NoneType check for think tokens in TokenizerWrapper#1167

Merged
angeloskath merged 4 commits intoml-explore:mainfrom
yuetyeelo2855:fix/tokenizer-think-tokens-none-check
Apr 21, 2026
Merged

fix: handle NoneType check for think tokens in TokenizerWrapper#1167
angeloskath merged 4 commits intoml-explore:mainfrom
yuetyeelo2855:fix/tokenizer-think-tokens-none-check

Conversation

@yuetyeelo2855
Copy link
Copy Markdown
Contributor

The Problem

When using models that do not explicitly define thinking tokens in their configuration (for example, Qwen2.5-Coder or Llama 3.2), accessing tokenizer.think_start_id or tokenizer.think_end_id triggers a TypeError: object of type 'NoneType' has no len().

This occurs because the current implementation assumes self._think_start_tokens is always a list, even when it hasn't been initialized for non-thinking models. This crash is particularly disruptive for downstream tools like Aider or custom inference schedulers that attempt to detect thinking capabilities via these properties.

The Fix

  • Implemented a non-intrusive defensive check using getattr() and if not logic in both think_start_id and think_end_id.
  • If the thinking tokens are None or missing, the properties now gracefully return None instead of crashing.
  • Preserved original validation: The ValueError for token sequences longer than 1 remains intact to ensure backward compatibility and logical consistency for models that do support thinking tokens.

Testing

Automated Tests

  • Added a new test case test_no_think_tokens_safe_access in tests/test_tokenizers.py which:
    • Verifies has_thinking is False for non-thinking models.
    • Verifies think_start_id and think_end_id return None safely.
    • Ensures no TypeError or AttributeError is raised.
  • Full Suite Result: Ran pytest tests on Apple M3 Max (macOS 15, Python 3.12.8).
    • Result: 191 passed, 1 skipped, 2 warnings (Warnings are unrelated SWIG/Python 3.12 deprecations).

Manual Traceback Verification (Before Fix)

Traceback (most recent call last):
  File "/opt/homebrew/Cellar/omlx/1/libexec/lib/python3.11/site-packages/omlx/engine_core.py", line 175, in _engine_loop
    output = await loop.run_in_executor(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/python@3.11/3.11.15/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/1/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 3539, in step
    scheduled, rejected = self._schedule_waiting()
                          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/1/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 2846, in _schedule_waiting
    if self._detect_needs_think_prefix(request):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/1/libexec/lib/python3.11/site-packages/omlx/scheduler.py", line 1427, in _detect_needs_think_prefix
    think_start_id = getattr(self.tokenizer, 'think_start_id', None)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/omlx/1/libexec/lib/python3.11/site-packages/mlx_lm/tokenizer_utils.py", line 400, in think_start_id
    if len(self._think_start_tokens) > 1:
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: object of type 'NoneType' has no len()

Checklist

  • My code follows the style guidelines of this project.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.

yuetyeelo2855 and others added 3 commits April 19, 2026 14:27
- Added defensive null checks using getattr() for think_start_id and
  think_end_id to prevent TypeError on models without thinking tokens.
- Preserved original validation logic for token sequences longer than 1.
- Added a comprehensive test case in tests/test_tokenizers.py to verify
  safe access on non-thinking models (e.g., Llama 3.2).
- Verified with full test suite (191 passed) on Apple Silicon.
Copy link
Copy Markdown
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@angeloskath angeloskath merged commit e1c24b3 into ml-explore:main Apr 21, 2026
2 checks passed
@yuetyeelo2855 yuetyeelo2855 deleted the fix/tokenizer-think-tokens-none-check branch April 26, 2026 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants