Skip to content

Suggestions: fix fuzzy skip bug and add distance-2 diacritic fallback#4

Open
pashol wants to merge 1 commit into
masterfrom
claude/improve-suggestions-diacritics-typos
Open

Suggestions: fix fuzzy skip bug and add distance-2 diacritic fallback#4
pashol wants to merge 1 commit into
masterfrom
claude/improve-suggestions-diacritics-typos

Conversation

@pashol

@pashol pashol commented Mar 26, 2026

Copy link
Copy Markdown
Owner

Summary

  • Fix fuzzy-skip off-by-one: i + 1 >= max_counti >= max_count in query_suggestions. The old condition disabled distance-1 typo correction whenever only 1 suggestion slot remained — e.g. when UserDictionary already filled 2 slots. Now fuzzy fires whenever any slot is open.
  • Add distance-2 fallback for diacritics: The cdict distance() function works on raw UTF-8 bytes. ö encodes as 2 bytes, so oppis is byte-distance 2 from öppis — unreachable with the existing distance-1 call. A new block after the main loop calls dict.distance(word, 2, remaining) as a last-resort filler, guarded to pure-ASCII input (≥5 chars) so it only fires in the "user omitted an accent" scenario and doesn't produce noisy candidates otherwise.
  • New is_pure_ascii helper: O(n) guard used by the distance-2 block.

Test plan

  • Load Swiss German dictionary, type oppisöppis appears as a suggestion
  • Add 2 personal words starting with hon, type honeownerhomeowner appears (previously blocked by the off-by-one)
  • Type a 3-4 char word with no matches → distance-2 block does NOT fire (length guard)
  • Type a word containing ö → distance-2 block does NOT fire (is_pure_ascii guard)
  • Normal typing of known words → no regression in suggestion quality

Two improvements to suggestion accuracy:

1. Fix off-by-one in fuzzy-skip guard: `i + 1 >= max_count` → `i >= max_count`.
   Previously, distance-1 fuzzy search was disabled whenever only 1 Cdict slot
   remained (e.g. when UserDictionary filled 2 slots), silently blocking typo
   correction in that case.

2. Add distance-2 fallback for diacritic substitutions. The cdict distance
   function operates on raw UTF-8 bytes; ö encodes as 2 bytes, so "oppis" is
   byte-distance 2 from "öppis" — out of reach for the existing distance-1
   call. The new block fires only for pure-ASCII input (≥5 chars) with empty
   suggestion slots, keeping noise low while surfacing accented candidates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant