Skip to content

Commit 8bb8baa

Browse files
authored
Merge pull request #78 from leehack/feat/chat-app-qwen35-multimodal-stability
feat(chat-app): refresh Qwen3.5 presets and harden multimodal text rendering
2 parents 6ebbe62 + 08481fd commit 8bb8baa

9 files changed

Lines changed: 259 additions & 76 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,11 @@
1717
* Added WebGPU bridge embedding APIs and wired web backend support for `LlamaEngine.embed(...)` / `embedBatch(...)`.
1818
* Updated default web bridge asset pinning to `leehack/llama-web-bridge-assets@v0.1.7` (built against llama.cpp `b8157`).
1919
* Validated the `v0.1.7` bridge bundle through local fetch-script checksum verification.
20+
* **Chat app model catalog + stability**:
21+
* Updated `example/chat_app` recommended Qwen presets to the Qwen3.5 lineup (`0.8B`, `2B`, `4B`, `9B`) and removed older Qwen2.5/Qwen3 defaults from the in-app library.
22+
* Added multimodal projector (`mmproj`) wiring for Qwen3.5 model cards and tuned safer multimodal defaults (`contextSize: 8192`, `maxTokens: 1024`).
23+
* Fixed Flutter text paint crashes caused by malformed UTF-16 streaming boundaries by aligning incremental reveal to surrogate-pair boundaries and sanitizing text/tool payload rendering paths.
24+
* Added sanitizer unit coverage and refreshed chat-app README architecture/troubleshooting sections for multimodal and UTF-16 guidance.
2025

2126
## 0.6.4
2227

example/chat_app/README.md

Lines changed: 31 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,8 @@ flutter test
3434
Note: this is a Flutter app, so use `flutter test` (not `dart test`).
3535

3636
### 2. Choose and Download a Model
37-
1. The app will open to a **Model Selection** screen.
38-
2. Select one of the pre-configured models (for example: FunctionGemma 270M, Llama 3.2 3B, Qwen 3 4B, Gemma 3/3n, DeepSeek R1 distills).
37+
1. The app will open to a **Manage Models** screen.
38+
2. Select one of the pre-configured models (for example: FunctionGemma 270M, Qwen3.5 0.8B/2B/4B/9B, Llama 3.2 3B, Gemma 3/3n, DeepSeek R1 distills).
3939
3. Tap the **Download** icon. The app uses `Dio` to download the model directly to your device's documents directory.
4040
4. Once downloaded, tap **Select** to load the model.
4141

@@ -79,34 +79,42 @@ The app follows a clean, layered architecture with strict separation of concerns
7979

8080
```
8181
lib/
82-
├── main.dart # App entry point
82+
├── main.dart # App entry point
8383
├── screens/
84-
│ ├── chat_screen.dart # Main chat screen
85-
│ └── model_selection_screen.dart # Model management UI
84+
│ ├── app_shell_screen.dart # Responsive shell/navigation host
85+
│ ├── chat_screen.dart # Main chat UI
86+
│ └── manage_models_screen.dart # Model library + inference controls
8687
├── widgets/
87-
│ ├── chat_input.dart # Message input area
88-
│ ├── message_bubble.dart # Styled chat bubbles
89-
│ ├── settings_sheet.dart # Advanced config UI
88+
│ ├── chat_input.dart # Message input + media staging
89+
│ ├── message_bubble.dart # Message rendering (markdown/thinking/tool)
90+
│ ├── model_card.dart # Model picker cards
91+
│ ├── tool_declarations_dialog.dart
92+
│ ├── tool_execution_card.dart
9093
│ └── ... # Other modular UI components
9194
├── providers/
9295
│ └── chat_provider.dart # App state & orchestration
9396
├── services/
94-
│ ├── chat_service.dart # Business logic & prompt building
95-
│ ├── model_service.dart # File system & download logic
97+
│ ├── chat_service.dart # Engine orchestration + prompt cleanup
98+
│ ├── chat_generation_service.dart
99+
│ ├── assistant_output_service.dart
100+
│ ├── model_service_base.dart
101+
│ ├── model_service_io.dart # Native download/delete/resume
102+
│ ├── model_service_web.dart # Browser cache prefetch/eviction
96103
│ └── settings_service.dart # Local persistence (SharedPreferences)
97104
├── models/
98105
│ ├── chat_message.dart # Message data with token caching
99106
│ ├── chat_settings.dart # Configuration data
100107
│ └── downloadable_model.dart # Model metadata
101-
└── stub/
102-
└── io_stub.dart # Web compatibility stubs
108+
└── utils/
109+
├── backend_utils.dart
110+
└── text_sanitizer.dart
103111
```
104112

105113
### Key Components
106114

107115
- **`ChatProvider`**: Orchestrates state and reacts to user input.
108116
- **`ChatService`**: Handles prompt construction, token counting, and engine interaction.
109-
- **`ModelService`**: Manages the local model library and background downloads.
117+
- **`ModelService`**: Manages the model library with native/web-specific download backends.
110118
- **`SettingsService`**: Handles persistent storage of user preferences.
111119
- **`ChatMessage`**: Implements **Token Caching** to optimize performance during long conversations.
112120

@@ -183,6 +191,16 @@ _(Add screenshots here when complete)_
183191
- Check if `GPU Layers` is set to a high enough value (default 99 offloads all layers).
184192
- Use a model with a smaller quantization level (e.g., Q4_K_M).
185193

194+
**Multimodal instability or decode crashes (Qwen3.5 VLMs):**
195+
- Keep Qwen3.5 model defaults unless you are tuning carefully (`Context Size` 8192, `Max Tokens` 1024).
196+
- Start a fresh conversation before large image prompts to avoid context-slot pressure.
197+
- If crashes persist on lower-memory devices, switch to the 0.8B/2B variants or disable multimodal for that run.
198+
199+
**`Invalid argument(s): string is not well-formed UTF-16` in Flutter painting:**
200+
- This indicates malformed streamed text (broken surrogate pair) reached text rendering.
201+
- Upgrade to the latest chat app code (stream-boundary + text-sanitization fixes are included).
202+
- Restart the app fully after upgrade (`flutter clean` + `flutter run`) to ensure stale binaries are not reused.
203+
186204
**Slow model downloads on iOS/Android:**
187205
- Run on a release/profile build (`flutter run --release`) for realistic transfer performance.
188206
- Large multimodal bundles download both model and mmproj files; expect two-stage transfer.

example/chat_app/lib/models/chat_message.dart

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
import 'package:llamadart/llamadart.dart';
22

3+
import '../utils/text_sanitizer.dart';
4+
35
class ChatMessage {
46
final String text;
57
final bool isUser;
@@ -11,15 +13,16 @@ class ChatMessage {
1113
int? tokenCount; // Cache token count for sliding window optimization
1214

1315
ChatMessage({
14-
required this.text,
16+
required String text,
1517
required this.isUser,
1618
this.isInfo = false,
1719
this.parts,
1820
this.debugBadges = const [],
1921
this.role,
2022
DateTime? timestamp,
2123
this.tokenCount,
22-
}) : timestamp = timestamp ?? DateTime.now();
24+
}) : text = sanitizeForTextLayout(text),
25+
timestamp = timestamp ?? DateTime.now();
2326

2427
/// Derived property to check if this message is a tool call.
2528
bool get isToolCall {
@@ -38,7 +41,11 @@ class ChatMessage {
3841
/// Derived property to get thinking content if present.
3942
String? get thinkingText {
4043
final thinkingPart = parts?.whereType<LlamaThinkingContent>().firstOrNull;
41-
return thinkingPart?.thinking;
44+
final thinking = thinkingPart?.thinking;
45+
if (thinking == null) {
46+
return null;
47+
}
48+
return sanitizeForTextLayout(thinking);
4249
}
4350

4451
ChatMessage copyWith({

example/chat_app/lib/models/downloadable_model.dart

Lines changed: 77 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -87,21 +87,26 @@ class DownloadableModel {
8787
),
8888
),
8989
DownloadableModel(
90-
name: 'Qwen2.5 0.5B Instruct',
90+
name: 'Qwen3.5 0.8B Instruct',
9191
description:
92-
'⚡ Ultra-light (491MB) • Fast and reliable web/mobile starter.',
92+
'🆕 Qwen3.5 mini VLM (720MB bundle) • Fast tools, vision, and thinking.',
9393
url:
94-
'https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_k_m.gguf?download=true',
95-
filename: 'qwen2.5-0.5b-instruct-q4_k_m.gguf',
96-
sizeBytes: 491400032,
97-
minRamGb: 2,
94+
'https://huggingface.co/bartowski/Qwen_Qwen3.5-0.8B-GGUF/resolve/main/Qwen_Qwen3.5-0.8B-Q4_K_M.gguf?download=true',
95+
filename: 'Qwen_Qwen3.5-0.8B-Q4_K_M.gguf',
96+
mmprojUrl:
97+
'https://huggingface.co/bartowski/Qwen_Qwen3.5-0.8B-GGUF/resolve/main/mmproj-Qwen_Qwen3.5-0.8B-f16.gguf?download=true',
98+
mmprojFilename: 'mmproj-Qwen_Qwen3.5-0.8B-f16.gguf',
99+
sizeBytes: 754903104,
100+
minRamGb: 3,
101+
supportsVision: true,
98102
supportsToolCalling: true,
103+
supportsThinking: true,
99104
preset: ModelPreset(
100-
temperature: 0.1,
101-
topK: 40,
102-
topP: 0.9,
103-
contextSize: 4096,
104-
maxTokens: 2048,
105+
temperature: 0.6,
106+
topK: 20,
107+
topP: 0.95,
108+
contextSize: 8192,
109+
maxTokens: 1024,
105110
),
106111
),
107112
DownloadableModel(
@@ -155,21 +160,49 @@ class DownloadableModel {
155160
),
156161
),
157162
DownloadableModel(
158-
name: 'Qwen2.5 1.5B Instruct',
163+
name: 'Qwen3.5 2B Instruct',
159164
description:
160-
'💬 Popular compact assistant (1.12GB) • Strong quality/size ratio.',
165+
'🆕 Qwen3.5 compact VLM (1.85GB bundle) • Better quality with tools, vision, and thinking.',
161166
url:
162-
'https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct-GGUF/resolve/main/qwen2.5-1.5b-instruct-q4_k_m.gguf?download=true',
163-
filename: 'qwen2.5-1.5b-instruct-q4_k_m.gguf',
164-
sizeBytes: 1117320736,
165-
minRamGb: 3,
167+
'https://huggingface.co/bartowski/Qwen_Qwen3.5-2B-GGUF/resolve/main/Qwen_Qwen3.5-2B-Q4_K_M.gguf?download=true',
168+
filename: 'Qwen_Qwen3.5-2B-Q4_K_M.gguf',
169+
mmprojUrl:
170+
'https://huggingface.co/bartowski/Qwen_Qwen3.5-2B-GGUF/resolve/main/mmproj-Qwen_Qwen3.5-2B-f16.gguf?download=true',
171+
mmprojFilename: 'mmproj-Qwen_Qwen3.5-2B-f16.gguf',
172+
sizeBytes: 1983861664,
173+
minRamGb: 5,
174+
supportsVision: true,
166175
supportsToolCalling: true,
176+
supportsThinking: true,
167177
preset: ModelPreset(
168-
temperature: 0.1,
169-
topK: 40,
170-
topP: 0.9,
178+
temperature: 0.6,
179+
topK: 20,
180+
topP: 0.95,
171181
contextSize: 8192,
172-
maxTokens: 2048,
182+
maxTokens: 1024,
183+
),
184+
),
185+
DownloadableModel(
186+
name: 'Qwen3.5 4B Instruct',
187+
description:
188+
'🆕 Qwen3.5 4B VLM (3.29GB bundle) • Strong multimodal reasoner with tool use.',
189+
url:
190+
'https://huggingface.co/bartowski/Qwen_Qwen3.5-4B-GGUF/resolve/main/Qwen_Qwen3.5-4B-Q4_K_M.gguf?download=true',
191+
filename: 'Qwen_Qwen3.5-4B-Q4_K_M.gguf',
192+
mmprojUrl:
193+
'https://huggingface.co/bartowski/Qwen_Qwen3.5-4B-GGUF/resolve/main/mmproj-Qwen_Qwen3.5-4B-f16.gguf?download=true',
194+
mmprojFilename: 'mmproj-Qwen_Qwen3.5-4B-f16.gguf',
195+
sizeBytes: 3529359968,
196+
minRamGb: 8,
197+
supportsVision: true,
198+
supportsToolCalling: true,
199+
supportsThinking: true,
200+
preset: ModelPreset(
201+
temperature: 0.6,
202+
topK: 20,
203+
topP: 0.95,
204+
contextSize: 8192,
205+
maxTokens: 1024,
173206
),
174207
),
175208
DownloadableModel(
@@ -271,25 +304,6 @@ class DownloadableModel {
271304
maxTokens: 2048,
272305
),
273306
),
274-
DownloadableModel(
275-
name: 'Qwen3 4B',
276-
description:
277-
'🧠 Thinking + tools (2.50GB) • Best all-around reasoning upgrade.',
278-
url:
279-
'https://huggingface.co/Qwen/Qwen3-4B-GGUF/resolve/main/Qwen3-4B-Q4_K_M.gguf?download=true',
280-
filename: 'Qwen3-4B-Q4_K_M.gguf',
281-
sizeBytes: 2497280256,
282-
minRamGb: 6,
283-
supportsToolCalling: true,
284-
supportsThinking: true,
285-
preset: ModelPreset(
286-
temperature: 0.6,
287-
topK: 20,
288-
topP: 0.95,
289-
contextSize: 8192,
290-
maxTokens: 4096,
291-
),
292-
),
293307
DownloadableModel(
294308
name: 'Meta-Llama 3.1 8B Instruct',
295309
description:
@@ -308,5 +322,28 @@ class DownloadableModel {
308322
maxTokens: 2048,
309323
),
310324
),
325+
DownloadableModel(
326+
name: 'Qwen3.5 9B Instruct',
327+
description:
328+
'🆕 Qwen3.5 9B VLM (6.32GB bundle) • Highest-quality Qwen option with thinking + tools.',
329+
url:
330+
'https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF/resolve/main/Qwen_Qwen3.5-9B-Q4_K_M.gguf?download=true',
331+
filename: 'Qwen_Qwen3.5-9B-Q4_K_M.gguf',
332+
mmprojUrl:
333+
'https://huggingface.co/bartowski/Qwen_Qwen3.5-9B-GGUF/resolve/main/mmproj-Qwen_Qwen3.5-9B-f16.gguf?download=true',
334+
mmprojFilename: 'mmproj-Qwen_Qwen3.5-9B-f16.gguf',
335+
sizeBytes: 6784286240,
336+
minRamGb: 12,
337+
supportsVision: true,
338+
supportsToolCalling: true,
339+
supportsThinking: true,
340+
preset: ModelPreset(
341+
temperature: 0.6,
342+
topK: 20,
343+
topP: 0.95,
344+
contextSize: 8192,
345+
maxTokens: 1024,
346+
),
347+
),
311348
];
312349
}

example/chat_app/lib/services/chat_generation_service.dart

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -246,13 +246,42 @@ class ChatGenerationService {
246246

247247
final backlog = targetText.length - currentText.length;
248248
final revealStep = _revealStepForBacklog(backlog);
249-
final nextLength = currentText.length + revealStep;
249+
var nextLength = currentText.length + revealStep;
250250
if (nextLength >= targetText.length) {
251251
return targetText;
252252
}
253+
254+
nextLength = _alignToUtf16Boundary(targetText, nextLength);
255+
if (nextLength >= targetText.length) {
256+
return targetText;
257+
}
258+
253259
return targetText.substring(0, nextLength);
254260
}
255261

262+
int _alignToUtf16Boundary(String text, int end) {
263+
if (end <= 0 || end >= text.length) {
264+
return end;
265+
}
266+
267+
final previousCodeUnit = text.codeUnitAt(end - 1);
268+
final nextCodeUnit = text.codeUnitAt(end);
269+
if (_isLeadingSurrogate(previousCodeUnit) &&
270+
_isTrailingSurrogate(nextCodeUnit)) {
271+
return end + 1;
272+
}
273+
274+
return end;
275+
}
276+
277+
bool _isLeadingSurrogate(int codeUnit) {
278+
return codeUnit >= 0xD800 && codeUnit <= 0xDBFF;
279+
}
280+
281+
bool _isTrailingSurrogate(int codeUnit) {
282+
return codeUnit >= 0xDC00 && codeUnit <= 0xDFFF;
283+
}
284+
256285
int _revealStepForBacklog(int backlog) {
257286
if (backlog <= 12) {
258287
return backlog;
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
String sanitizeForTextLayout(String input) {
2+
if (input.isEmpty) {
3+
return input;
4+
}
5+
6+
final units = input.codeUnits;
7+
final output = StringBuffer();
8+
9+
for (var i = 0; i < units.length; i++) {
10+
final current = units[i];
11+
12+
if (_isLeadingSurrogate(current)) {
13+
if (i + 1 < units.length && _isTrailingSurrogate(units[i + 1])) {
14+
output.writeCharCode(current);
15+
output.writeCharCode(units[i + 1]);
16+
i++;
17+
} else {
18+
output.writeCharCode(0xFFFD);
19+
}
20+
continue;
21+
}
22+
23+
if (_isTrailingSurrogate(current)) {
24+
output.writeCharCode(0xFFFD);
25+
continue;
26+
}
27+
28+
output.writeCharCode(current);
29+
}
30+
31+
return output.toString();
32+
}
33+
34+
bool _isLeadingSurrogate(int codeUnit) {
35+
return codeUnit >= 0xD800 && codeUnit <= 0xDBFF;
36+
}
37+
38+
bool _isTrailingSurrogate(int codeUnit) {
39+
return codeUnit >= 0xDC00 && codeUnit <= 0xDFFF;
40+
}

0 commit comments

Comments
 (0)