Skip to content

fix: vision image token insertion#671

Merged
dushyantbehl merged 7 commits into
foundation-model-stack:mainfrom
YashasviChaurasia:fix/vision-image-token-insertion
Mar 30, 2026
Merged

fix: vision image token insertion#671
dushyantbehl merged 7 commits into
foundation-model-stack:mainfrom
YashasviChaurasia:fix/vision-image-token-insertion

Conversation

@YashasviChaurasia
Copy link
Copy Markdown
Contributor

Description of the change

This PR includes two critical fixes:

  1. Vision model image token bug - Fixes "Image features and image tokens do not match" error
  2. Transformers v5 API compatibility - Restores compatibility with transformers v4.55+
  3. Test suite fixes - Resolves test failures in CI/CD

Problem 1: Vision Model Training Failure

Error:

ValueError: Image features and image tokens do not match: tokens: 0, features 18432

Root Cause:
The apply_tokenizer_chat_template handler wasn't correctly extracting conversation messages from OpenAI format datasets when conversation_column_name was not explicitly set. This resulted in formatted text without <image> tokens, causing vision model training to fail.

Fix

  • Adds auto-detection for common conversation column names ('messages', 'conversation', 'chat', 'turns')
  • Adds validation to ensure image tokens are present when images exist in the dataset
  • Enhances error messages with actionable guidance

Problem 2: Transformers v5 API Breaking Change

Error:

  AttributeError at line 610: labels = input_ids.clone()
  KeyError: 'clone'

Root Cause:
In transformers v4.55+, apply_chat_template() with return_tensors='pt' changed behavior:

  • Old API (v4.x): Returns {"input_ids": tensor} (dict)
  • New API (v4.55+): Returns tensor directly OR BatchEncoding object (in tox environment)

The code was doing result["input_ids"] which fails when:

  1. result is a tensor (causes IndexError)
  2. result is a BatchEncoding without .clone() method (causes AttributeError)

Solution:
Added robust handling for all three return types in tokenize_and_apply_chat_template_with_masking:

# Handle both old API (dict/BatchEncoding) and new API (tensor)
if hasattr(result, "input_ids"):
    input_ids = result.input_ids  # BatchEncoding or dict-like
elif isinstance(result, dict):
    input_ids = result["input_ids"]  # Plain dict
else:
    input_ids = result  # Direct tensor

Problem 3: Test Suite Failures

3a. test_empty_data

Error:

StopIteration (expected DatasetGenerationError or ValueError)

Root Cause:
Datasets library in transformers v5 raises StopIteration when processing empty JSON files.

Solution:
Added StopIteration to expected exceptions in the test.

3b. test_run_chat_style_ft_using_custom_split_name

Error:

 NotImplementedError: "histogram_mps" not implemented for 'Int'

Root Cause:
MoE models use histogram operations that are incompatible with Apple Silicon MPS backend.

Solution:
Skip test on MPS-only systems using @pytest.mark.skipif.

Related issue number

How to verify the PR

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

…lity

Signed-off-by: yashasvi <yashasvi@ibm.com>
@github-actions
Copy link
Copy Markdown

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

@github-actions github-actions Bot added the fix label Mar 18, 2026
Signed-off-by: yashasvi <yashasvi@ibm.com>
@YashasviChaurasia YashasviChaurasia force-pushed the fix/vision-image-token-insertion branch from 1c7bd0d to b2344fd Compare March 18, 2026 07:04
@dushyantbehl
Copy link
Copy Markdown
Collaborator

/build

Comment thread tests/test_sft_trainer.py

@pytest.mark.skipif(
torch.backends.mps.is_available() and not torch.cuda.is_available(),
reason="MoE models have histogram incompatibility with MPS backend",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we adding this here? this test was running fine without anything right? is it not running on mac now?
if so what model are we using which is MoE? can we choose another?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh I did miss this, this was for my local testing tho.. the test was failing on my mac locally..
Shouldn't be a problem for github actions test coverage btw

@github-actions
Copy link
Copy Markdown

Build succeeded for b2344fd (NVCR image)

View run

@dushyantbehl
Copy link
Copy Markdown
Collaborator

/build

@github-actions
Copy link
Copy Markdown

Build failed for b2344fd (NVCR image)

View run

@dushyantbehl
Copy link
Copy Markdown
Collaborator

/build

@github-actions
Copy link
Copy Markdown

Build succeeded for b2344fd (NVCR image)

View run

@github-actions
Copy link
Copy Markdown

Build succeeded for b2344fd (NVCR image)

View run

Signed-off-by: yashasvi <yashasvi@ibm.com>
Signed-off-by: yashasvi <yashasvi@ibm.com>
@YashasviChaurasia YashasviChaurasia force-pushed the fix/vision-image-token-insertion branch from 400238c to 45772b3 Compare March 27, 2026 07:14
Signed-off-by: Yashasvi Chaurasia <46622381+YashasviChaurasia@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@dushyantbehl dushyantbehl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @YashasviChaurasia please fix the resolve conflicts and we can merge this PR.

Signed-off-by: Yashasvi Chaurasia <46622381+YashasviChaurasia@users.noreply.github.com>
@dushyantbehl
Copy link
Copy Markdown
Collaborator

/build

@github-actions
Copy link
Copy Markdown

Build failed for f9d223f (NVCR image)

View run

Signed-off-by: yashasvi <yashasvi@ibm.com>
@YashasviChaurasia
Copy link
Copy Markdown
Contributor Author

/build

@github-actions
Copy link
Copy Markdown

Build failed for 992823d (NVCR image)

View run

@github-actions
Copy link
Copy Markdown

Build failed for 992823d (NVCR image)

View run

@YashasviChaurasia
Copy link
Copy Markdown
Contributor Author

/build

@github-actions
Copy link
Copy Markdown

Build failed for 992823d (NVCR image)

View run

@dushyantbehl dushyantbehl merged commit d3c30a0 into foundation-model-stack:main Mar 30, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants