You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add Gemma4Unified models and logits bias handling for suppress tokens
- Introduced Gemma4UnifiedForConditionalGeneration and Gemma4UnifiedVisionAudioModel classes to enhance multimodal capabilities.
- Implemented functionality to retrieve and apply suppress tokens from generation configuration, improving model output control.
- Updated tensor modification logic to accommodate new model architectures and ensure proper handling of positional embeddings.
- Enhanced logits bias handling in the C++ implementation to mirror suppress tokens functionality, addressing known output issues.
cur = ggml_scale(ctx0, cur, hparams.f_final_logit_softcapping);
254
279
}
255
280
281
+
// apply logits bias if needed (e.g. for gemma4_unified patch)
282
+
// this is to mirror the suppress_tokens patch on transformers, to avoid model from outputing <image|> and <audio|> tokens (which is a known issue related to the checkpoint)
283
+
// TODO: maybe handle this inside the sampling system in the future
284
+
if (!model.vocab.get_suppress_tokens().empty()) {
285
+
auto inp_bias = std::make_unique<llm_graph_input_logits_bias>(model.vocab);
0 commit comments