Commit 5b0196c
committed
fix(whisper): scrub invalid UTF-8 from segment text before protobuf marshal
whisper.cpp can emit bytes that are not valid UTF-8 — typically a
multibyte codepoint split across token boundaries. protobuf string
fields reject those at marshal time, which would surface as a transcribe
failure. Run strings.ToValidUTF8 on the segment text before it leaves
the cgo boundary so the bad byte gets replaced with U+FFFD.
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude:claude-opus-4-7 [Claude Code]1 parent c8d63a1 commit 5b0196c
1 file changed
Lines changed: 4 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
139 | 139 | | |
140 | 140 | | |
141 | 141 | | |
142 | | - | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
143 | 146 | | |
144 | 147 | | |
145 | 148 | | |
| |||
0 commit comments