Merge pull request #78 from leehack/feat/chat-app-qwen35-multimodal-stability

leehack · web-flow · commit 8bb8baa4e1d7 · 2026-03-02T23:21:19.000-05:00
feat(chat-app): refresh Qwen3.5 presets and harden multimodal text rendering
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -17,6 +17,11 @@
   * Added WebGPU bridge embedding APIs and wired web backend support for `LlamaEngine.embed(...)` / `embedBatch(...)`.
   * Updated default web bridge asset pinning to `leehack/llama-web-bridge-assets@v0.1.7` (built against llama.cpp `b8157`).
   * Validated the `v0.1.7` bridge bundle through local fetch-script checksum verification.
+* **Chat app model catalog + stability**:
+  * Updated `example/chat_app` recommended Qwen presets to the Qwen3.5 lineup (`0.8B`, `2B`, `4B`, `9B`) and removed older Qwen2.5/Qwen3 defaults from the in-app library.
+  * Added multimodal projector (`mmproj`) wiring for Qwen3.5 model cards and tuned safer multimodal defaults (`contextSize: 8192`, `maxTokens: 1024`).
+  * Fixed Flutter text paint crashes caused by malformed UTF-16 streaming boundaries by aligning incremental reveal to surrogate-pair boundaries and sanitizing text/tool payload rendering paths.
+  * Added sanitizer unit coverage and refreshed chat-app README architecture/troubleshooting sections for multimodal and UTF-16 guidance.
 
 ## 0.6.4
 
diff --git a/example/chat_app/README.md b/example/chat_app/README.md
@@ -34,8 +34,8 @@ flutter test
 Note: this is a Flutter app, so use `flutter test` (not `dart test`).
 
 ### 2. Choose and Download a Model
-1. The app will open to a **Model Selection** screen.
-2. Select one of the pre-configured models (for example: FunctionGemma 270M, Llama 3.2 3B, Qwen 3 4B, Gemma 3/3n, DeepSeek R1 distills).
+1. The app will open to a **Manage Models** screen.
+2. Select one of the pre-configured models (for example: FunctionGemma 270M, Qwen3.5 0.8B/2B/4B/9B, Llama 3.2 3B, Gemma 3/3n, DeepSeek R1 distills).
 3. Tap the **Download** icon. The app uses `Dio` to download the model directly to your device's documents directory.
 4. Once downloaded, tap **Select** to load the model.
 
@@ -79,34 +79,42 @@ The app follows a clean, layered architecture with strict separation of concerns
 
 ```
 lib/
-├── main.dart              # App entry point
+├── main.dart                      # App entry point
 ├── screens/
-│   ├── chat_screen.dart            # Main chat screen
-│   └── model_selection_screen.dart  # Model management UI
+│   ├── app_shell_screen.dart       # Responsive shell/navigation host
+│   ├── chat_screen.dart            # Main chat UI
+│   └── manage_models_screen.dart   # Model library + inference controls
 ├── widgets/
-│   ├── chat_input.dart             # Message input area
-│   ├── message_bubble.dart         # Styled chat bubbles
-│   ├── settings_sheet.dart         # Advanced config UI
+│   ├── chat_input.dart             # Message input + media staging
+│   ├── message_bubble.dart         # Message rendering (markdown/thinking/tool)
+│   ├── model_card.dart             # Model picker cards
+│   ├── tool_declarations_dialog.dart
+│   ├── tool_execution_card.dart
 │   └── ...                         # Other modular UI components
 ├── providers/
 │   └── chat_provider.dart          # App state & orchestration
 ├── services/
-│   ├── chat_service.dart           # Business logic & prompt building
-│   ├── model_service.dart          # File system & download logic
+│   ├── chat_service.dart           # Engine orchestration + prompt cleanup
+│   ├── chat_generation_service.dart
+│   ├── assistant_output_service.dart
+│   ├── model_service_base.dart
+│   ├── model_service_io.dart       # Native download/delete/resume
+│   ├── model_service_web.dart      # Browser cache prefetch/eviction
 │   └── settings_service.dart       # Local persistence (SharedPreferences)
 ├── models/
 │   ├── chat_message.dart           # Message data with token caching
 │   ├── chat_settings.dart          # Configuration data
 │   └── downloadable_model.dart     # Model metadata
-└── stub/
-    └── io_stub.dart                # Web compatibility stubs
+└── utils/
+    ├── backend_utils.dart
+    └── text_sanitizer.dart
 ```
 
 ### Key Components
 
 - **`ChatProvider`**: Orchestrates state and reacts to user input.
 - **`ChatService`**: Handles prompt construction, token counting, and engine interaction.
-- **`ModelService`**: Manages the local model library and background downloads.
+- **`ModelService`**: Manages the model library with native/web-specific download backends.
 - **`SettingsService`**: Handles persistent storage of user preferences.
 - **`ChatMessage`**: Implements **Token Caching** to optimize performance during long conversations.
 
@@ -183,6 +191,16 @@ _(Add screenshots here when complete)_
 - Check if `GPU Layers` is set to a high enough value (default 99 offloads all layers).
 - Use a model with a smaller quantization level (e.g., Q4_K_M).
 
+**Multimodal instability or decode crashes (Qwen3.5 VLMs):**
+- Keep Qwen3.5 model defaults unless you are tuning carefully (`Context Size` 8192, `Max Tokens` 1024).
+- Start a fresh conversation before large image prompts to avoid context-slot pressure.
+- If crashes persist on lower-memory devices, switch to the 0.8B/2B variants or disable multimodal for that run.
+
+**`Invalid argument(s): string is not well-formed UTF-16` in Flutter painting:**
+- This indicates malformed streamed text (broken surrogate pair) reached text rendering.
+- Upgrade to the latest chat app code (stream-boundary + text-sanitization fixes are included).
+- Restart the app fully after upgrade (`flutter clean` + `flutter run`) to ensure stale binaries are not reused.
+
 **Slow model downloads on iOS/Android:**
 - Run on a release/profile build (`flutter run --release`) for realistic transfer performance.
 - Large multimodal bundles download both model and mmproj files; expect two-stage transfer.
diff --git a/example/chat_app/lib/models/chat_message.dart b/example/chat_app/lib/models/chat_message.dart
@@ -1,5 +1,7 @@
 import 'package:llamadart/llamadart.dart';
 
+import '../utils/text_sanitizer.dart';
+
 class ChatMessage {
   final String text;
   final bool isUser;
@@ -11,15 +13,16 @@ class ChatMessage {
   int? tokenCount; // Cache token count for sliding window optimization
 
   ChatMessage({
-    required this.text,
+    required String text,
     required this.isUser,
     this.isInfo = false,
     this.parts,
     this.debugBadges = const [],
     this.role,
     DateTime? timestamp,
     this.tokenCount,
-  }) : timestamp = timestamp ?? DateTime.now();
+  }) : text = sanitizeForTextLayout(text),
+       timestamp = timestamp ?? DateTime.now();
 
   /// Derived property to check if this message is a tool call.
   bool get isToolCall {
@@ -38,7 +41,11 @@ class ChatMessage {
   /// Derived property to get thinking content if present.
   String? get thinkingText {
     final thinkingPart = parts?.whereType<LlamaThinkingContent>().firstOrNull;
-    return thinkingPart?.thinking;
+    final thinking = thinkingPart?.thinking;
+    if (thinking == null) {
+      return null;
+    }
+    return sanitizeForTextLayout(thinking);
   }
 
   ChatMessage copyWith({
diff --git a/example/chat_app/lib/models/downloadable_model.dart b/example/chat_app/lib/models/downloadable_model.dart
@@ -87,21 +87,26 @@ class DownloadableModel {
       ),
     ),
     DownloadableModel(
-      name: 'Qwen2.5 0.5B Instruct',
+      name: 'Qwen3.5 0.8B Instruct',
       description:
-          '⚡ Ultra-light (491MB) • Fast and reliable web/mobile starter.',
+          '🆕 Qwen3.5 mini VLM (720MB bundle) • Fast tools, vision, and thinking.',
       url:
-          'https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_k_m.gguf?download=true',
-      filename: 'qwen2.5-0.5b-instruct-q4_k_m.gguf',
-      sizeBytes: 491400032,
-      minRamGb: 2,
+          'https://huggingface.co/bartowski/Qwen_Qwen3.5-0.8B-GGUF/resolve/main/Qwen_Qwen3.5-0.8B-Q4_K_M.gguf?download=true',
+      filename: 'Qwen_Qwen3.5-0.8B-Q4_K_M.gguf',
+      mmprojUrl:
+          'https://huggingface.co/bartowski/Qwen_Qwen3.5-0.8B-GGUF/resolve/main/mmproj-Qwen_Qwen3.5-0.8B-f16.gguf?download=true',
+      mmprojFilename: 'mmproj-Qwen_Qwen3.5-0.8B-f16.gguf',
+      sizeBytes: 754903104,
+      minRamGb: 3,
+      supportsVision: true,
       supportsToolCalling: true,
+      supportsThinking: true,
       preset: ModelPreset(
-        temperature: 0.1,
-        topK: 40,
-        topP: 0.9,
-        contextSize: 4096,
-        maxTokens: 2048,
+        temperature: 0.6,
+        topK: 20,
+        topP: 0.95,
+        contextSize: 8192,
+        maxTokens: 1024,
       ),
     ),
     DownloadableModel(
@@ -155,21 +160,49 @@ class DownloadableModel {
       ),
     ),
     DownloadableModel(
-      name: 'Qwen2.5 1.5B Instruct',
+      name: 'Qwen3.5 2B Instruct',
       description:
-          '💬 Popular compact assistant (1.12GB) • Strong quality/size ratio.',
+          '🆕 Qwen3.5 compact VLM (1.85GB bundle) • Better quality with tools, vision, and thinking.',
       url:
-          'https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q4_k_m.gguf?download=true',
-      filename: 'qwen2.5-1.5b-instruct-q4_k_m.gguf',
-      sizeBytes: 1117320736,
-      minRamGb: 3,
+          'https://huggingface.co/bartowski/Qwen_Qwen3.5-2B-GGUF/resolve/main/Qwen_Qwen3.5-2B-Q4_K_M.gguf?download=true',
+      filename: 'Qwen_Qwen3.5-2B-Q4_K_M.gguf',
+      mmprojUrl:
+          'https://huggingface.co/bartowski/Qwen_Qwen3.5-2B-GGUF/resolve/main/mmproj-Qwen_Qwen3.5-2B-f16.gguf?download=true',
+      mmprojFilename: 'mmproj-Qwen_Qwen3.5-2B-f16.gguf',
+      sizeBytes: 1983861664,
+      minRamGb: 5,
+      supportsVision: true,
       supportsToolCalling: true,
+      supportsThinking: true,
       preset: ModelPreset(
-        temperature: 0.1,
-        topK: 40,
-        topP: 0.9,
+        temperature: 0.6,
+        topK: 20,
+        topP: 0.95,
         contextSize: 8192,
-        maxTokens: 2048,
+        maxTokens: 1024,
+      ),
+    ),
+    DownloadableModel(
+      name: 'Qwen3.5 4B Instruct',
+      description:
+          '🆕 Qwen3.5 4B VLM (3.29GB bundle) • Strong multimodal reasoner with tool use.',
+      url:
+          'https://huggingface.co/bartowski/Qwen_Qwen3.5-4B-GGUF/resolve/main/Qwen_Qwen3.5-4B-Q4_K_M.gguf?download=true',
+      filename: 'Qwen_Qwen3.5-4B-Q4_K_M.gguf',
+      mmprojUrl:
+          'https://huggingface.co/bartowski/Qwen_Qwen3.5-4B-GGUF/resolve/main/mmproj-Qwen_Qwen3.5-4B-f16.gguf?download=true',
+      mmprojFilename: 'mmproj-Qwen_Qwen3.5-4B-f16.gguf',
+      sizeBytes: 3529359968,
+      minRamGb: 8,
+      supportsVision: true,
+      supportsToolCalling: true,
+      supportsThinking: true,
+      preset: ModelPreset(
+        temperature: 0.6,
+        topK: 20,
+        topP: 0.95,
+        contextSize: 8192,
+        maxTokens: 1024,
       ),
     ),
     DownloadableModel(
@@ -271,25 +304,6 @@ class DownloadableModel {
         maxTokens: 2048,
       ),
     ),
-    DownloadableModel(
-      name: 'Qwen3 4B',
-      description:
-          '🧠 Thinking + tools (2.50GB) • Best all-around reasoning upgrade.',
-      url:
-          'https://huggingface.co/Qwen/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q4_K_M.gguf?download=true',
-      filename: 'Qwen3-4B-Q4_K_M.gguf',
-      sizeBytes: 2497280256,
-      minRamGb: 6,
-      supportsToolCalling: true,
-      supportsThinking: true,
-      preset: ModelPreset(
-        temperature: 0.6,
-        topK: 20,
-        topP: 0.95,
-        contextSize: 8192,
-        maxTokens: 4096,
-      ),
-    ),
     DownloadableModel(
       name: 'Meta-Llama 3.1 8B Instruct',
       description:
@@ -308,5 +322,28 @@ class DownloadableModel {
         maxTokens: 2048,
       ),
     ),
+    DownloadableModel(
+      name: 'Qwen3.5 9B Instruct',
+      description:
+          '🆕 Qwen3.5 9B VLM (6.32GB bundle) • Highest-quality Qwen option with thinking + tools.',
+      url:
+          'https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF/resolve/main/Qwen_Qwen3.5-9B-Q4_K_M.gguf?download=true',
+      filename: 'Qwen_Qwen3.5-9B-Q4_K_M.gguf',
+      mmprojUrl:
+          'https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF/resolve/main/mmproj-Qwen_Qwen3.5-9B-f16.gguf?download=true',
+      mmprojFilename: 'mmproj-Qwen_Qwen3.5-9B-f16.gguf',
+      sizeBytes: 6784286240,
+      minRamGb: 12,
+      supportsVision: true,
+      supportsToolCalling: true,
+      supportsThinking: true,
+      preset: ModelPreset(
+        temperature: 0.6,
+        topK: 20,
+        topP: 0.95,
+        contextSize: 8192,
+        maxTokens: 1024,
+      ),
+    ),
   ];
 }
diff --git a/example/chat_app/lib/services/chat_generation_service.dart b/example/chat_app/lib/services/chat_generation_service.dart
@@ -246,13 +246,42 @@ class ChatGenerationService {
 
     final backlog = targetText.length - currentText.length;
     final revealStep = _revealStepForBacklog(backlog);
-    final nextLength = currentText.length + revealStep;
+    var nextLength = currentText.length + revealStep;
     if (nextLength >= targetText.length) {
       return targetText;
     }
+
+    nextLength = _alignToUtf16Boundary(targetText, nextLength);
+    if (nextLength >= targetText.length) {
+      return targetText;
+    }
+
     return targetText.substring(0, nextLength);
   }
 
+  int _alignToUtf16Boundary(String text, int end) {
+    if (end <= 0 || end >= text.length) {
+      return end;
+    }
+
+    final previousCodeUnit = text.codeUnitAt(end - 1);
+    final nextCodeUnit = text.codeUnitAt(end);
+    if (_isLeadingSurrogate(previousCodeUnit) &&
+        _isTrailingSurrogate(nextCodeUnit)) {
+      return end + 1;
+    }
+
+    return end;
+  }
+
+  bool _isLeadingSurrogate(int codeUnit) {
+    return codeUnit >= 0xD800 && codeUnit <= 0xDBFF;
+  }
+
+  bool _isTrailingSurrogate(int codeUnit) {
+    return codeUnit >= 0xDC00 && codeUnit <= 0xDFFF;
+  }
+
   int _revealStepForBacklog(int backlog) {
     if (backlog <= 12) {
       return backlog;
diff --git a/example/chat_app/lib/utils/text_sanitizer.dart b/example/chat_app/lib/utils/text_sanitizer.dart
@@ -0,0 +1,40 @@
+String sanitizeForTextLayout(String input) {
+  if (input.isEmpty) {
+    return input;
+  }
+
+  final units = input.codeUnits;
+  final output = StringBuffer();
+
+  for (var i = 0; i < units.length; i++) {
+    final current = units[i];
+
+    if (_isLeadingSurrogate(current)) {
+      if (i + 1 < units.length && _isTrailingSurrogate(units[i + 1])) {
+        output.writeCharCode(current);
+        output.writeCharCode(units[i + 1]);
+        i++;
+      } else {
+        output.writeCharCode(0xFFFD);
+      }
+      continue;
+    }
+
+    if (_isTrailingSurrogate(current)) {
+      output.writeCharCode(0xFFFD);
+      continue;
+    }
+
+    output.writeCharCode(current);
+  }
+
+  return output.toString();
+}
+
+bool _isLeadingSurrogate(int codeUnit) {
+  return codeUnit >= 0xD800 && codeUnit <= 0xDBFF;
+}
+
+bool _isTrailingSurrogate(int codeUnit) {
+  return codeUnit >= 0xDC00 && codeUnit <= 0xDFFF;
+}
diff --git a/example/chat_app/lib/widgets/message_bubble.dart b/example/chat_app/lib/widgets/message_bubble.dart
diff --git a/example/chat_app/lib/widgets/tool_execution_card.dart b/example/chat_app/lib/widgets/tool_execution_card.dart
diff --git a/example/chat_app/test/text_sanitizer_test.dart b/example/chat_app/test/text_sanitizer_test.dart