Skip to content

Commit 39b9b52

Browse files
committed
fix(autocorrect): route typo-of-contraction-base through alias map (#101)
Follow-up to Tier A — covers the case where the new adjacency-aware dict scan converges on an alias-keyed bare-form (`dont`, `hadnt`, `couldnt`, etc.) instead of the contracted target. Before this change, typing `hadnr` (r↔t adjacent) returned `have` because: 1. `have` won the freq tiebreaker — binary dict scale gives it ≈ 1M vs `hadnt`'s 5000 after `loadPrimaryContractionKeys` overwrites the existing freq with `currentDict[withApostrophe] ?: 5000` (apostrophe form is never in dict → unconditional 5000 anchor). 2. Even when the alias-key was the score-wise best match, the `autoCorrect` return-path only consulted `contractionAliases` at step 0 (on the typed input) — never on the dict-scan winner. Fix routes through two layers: - `ALIAS_KEY_FLOOR_FREQUENCY` (`Int.MAX_VALUE / 2`) — alias-key candidates that clear the score threshold are floored above any non-alias freq, regardless of which loader populated the dict (JSON ≈ 100-10k, binary ≈ 5500-1M). Among multiple alias-keys competing on the same typo (`couldnr` matches both `couldnt` and `couldve`), a `score * 1000` offset added to the floor makes the higher-scoring candidate win deterministically — eliminates the hash-map iteration-order race that previously picked `could've` over `couldn't`. - End-of-scan alias re-route — after `bestCandidate` is picked, `contractionAliases[winner]` lookup substitutes the contracted form. Capitalization rules from step 0 (I-prefix → "I'm"; else `preserveCapitalization`) are reused. Tests (3 new in AutocorrectTest, all instrumented): - `aliasDirect_hadntIsLoaded` — sanity probe that the alias map is populated for this @before setup (was unclear earlier whether `loadDictionary` triggered contractions loading in tests). - `contractionBaseTypo_hadnrToHadntContracted` — `hadnr → hadn't` via the new scan-then-reroute path. - `contractionBaseTypo_couldnrToCouldntContracted` — `couldnr → couldn't`, validating the score-tiebreaker among multiple alias-key candidates (`couldnt` vs `couldve`). Verified: 1231 pure + 194 mock + 1279 instrumented tests pass. — opus 4.7
1 parent 2213f3b commit 39b9b52

2 files changed

Lines changed: 101 additions & 5 deletions

File tree

src/androidTest/kotlin/tribixbite/cleverkeys/AutocorrectTest.kt

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,55 @@ class AutocorrectTest {
217217
assertTrue("Min length should be reasonable", minLength >= 0 && minLength <= 5)
218218
}
219219

220+
// =========================================================================
221+
// Issue #101 follow-up — typo of a contraction base must autocorrect to
222+
// the CONTRACTED form, not the bare alias key. Reported 2026-05-21:
223+
// Tier A's adjacency-aware dict scan now reaches alias-injected `dont` /
224+
// `im` / `youre` entries via near-miss typos, but autoCorrect returned
225+
// the bare alias key because the contractionAliases lookup only fired on
226+
// the typed input, not on the dict-scan winner. Fix: re-route the winner
227+
// through contractionAliases before returning.
228+
// =========================================================================
229+
230+
@Test
231+
fun testAutocorrect_aliasDirect_hadntIsLoaded() {
232+
// Sanity probe: confirms `contractionAliases` is populated for this
233+
// test's @Before setup. If this fails the typo-of-base tests below
234+
// can't possibly work (boost is gated on `dictWord in aliases`).
235+
config.autocorrect_enabled = true
236+
val result = predictor.autoCorrect("hadnt")
237+
assertEquals("alias direct path: hadnt should map to hadn't via step 0",
238+
"hadn't", result)
239+
}
240+
241+
@Test
242+
fun testAutocorrect_contractionBaseTypo_hadnrToHadntContracted() {
243+
// `hadnr` is a 5-char typo of `hadnt` (t→r, both top-row adjacent).
244+
// `hadnt` is alias-injected mapping to "hadn't". The `hadn?` prefix
245+
// and ≥ 50% exact ratio leave `hadnt` (alias-keyed) as the only
246+
// viable winner — no high-freq 5-char competitor matches the
247+
// `had?r` shape with exactRatio ≥ 0.5.
248+
// Expected: the dict-scan winner `hadnt` is re-routed through the
249+
// alias map to yield "hadn't".
250+
config.autocorrect_enabled = true
251+
config.autocorrect_prefix_length = 0
252+
val result = predictor.autoCorrect("hadnr")
253+
assertEquals("hadnr → hadn't (alias re-routed dict-scan winner)",
254+
"hadn't", result)
255+
}
256+
257+
@Test
258+
fun testAutocorrect_contractionBaseTypo_couldnrToCouldntContracted() {
259+
// `couldnr` is a 7-char typo of `couldnt` (t→r, adjacent). With
260+
// 7 chars the prefix constraint plus exactRatio ≥ 0.5 leaves only
261+
// `couldnt` as a candidate among 7-char dict words.
262+
config.autocorrect_enabled = true
263+
config.autocorrect_prefix_length = 0
264+
val result = predictor.autoCorrect("couldnr")
265+
assertEquals("couldnr → couldn't (alias re-routed dict-scan winner)",
266+
"couldn't", result)
267+
}
268+
220269
// =========================================================================
221270
// Config settings tests
222271
// =========================================================================

src/main/kotlin/tribixbite/cleverkeys/WordPredictor.kt

Lines changed: 52 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,28 @@ class WordPredictor {
7070
* - `wuestion → something` (lenDiff=1, ed≈2.71): 2.71 > 1.5 ✗
7171
*/
7272
private const val LENGTH_DIFF_ED_BUDGET = 0.5f
73+
74+
/**
75+
* Floor frequency assigned to alias-key candidates (bare-form
76+
* contractions like `dont`, `cant`, `hadnt`) during the dict-scan
77+
* tiebreaker. Sized at `Int.MAX_VALUE / 2` so the alias-key wins
78+
* against ANY non-alias candidate regardless of which freq scale
79+
* is in use (JSON path: ≈100–10k, binary path: ≈5500–1,000,000).
80+
*
81+
* Product intent: when a typo is a near-match to a contraction
82+
* base AND clears the score threshold, the contracted form is the
83+
* far more likely intent than a similarly-scored common word —
84+
* `donr → don't` beats `donr → done` because users typing `donr`
85+
* almost always meant `don't`. Tied/near-tied scores are the norm
86+
* for typos: a single adjacent-key substitution lands at score ≈
87+
* 0.97 for many candidate words simultaneously, and the old
88+
* frequency tiebreaker silently picked the wrong winner.
89+
*
90+
* Halved from `Int.MAX_VALUE` so two aliases competing in the
91+
* same scan don't overflow into ambiguous wraparound — though
92+
* that case is itself a corner (most typos match only one base).
93+
*/
94+
private const val ALIAS_KEY_FLOOR_FREQUENCY = Int.MAX_VALUE / 2
7395
private const val MAX_EDIT_DISTANCE = 2
7496
private const val MAX_RECENT_WORDS = 20 // Keep last 20 words for language detection
7597
private const val PREFIX_INDEX_MAX_LENGTH = 3 // Index prefixes up to 3 chars
@@ -1845,17 +1867,42 @@ class WordPredictor {
18451867
}
18461868

18471869
if (score >= charMatchThreshold) {
1848-
// Tiebreaker: higher dictionary frequency wins.
1849-
if (bestCandidate == null || candidateFrequency > bestCandidate.score) {
1850-
bestCandidate = WordCandidate(dictWord, candidateFrequency)
1870+
// Tiebreaker: higher dictionary frequency wins, but alias-
1871+
// keys (bare-form contractions like `dont`, `cant`, `hadnt`)
1872+
// are floored at `ALIAS_KEY_FLOOR_FREQUENCY` so they always
1873+
// beat non-alias competitors. Among multiple alias-keys
1874+
// (e.g. `couldnr` matches both `couldnt` AND `couldve`),
1875+
// the higher-scoring candidate wins via a score-scaled
1876+
// offset added to the floor. Without the offset, hash-map
1877+
// iteration order silently picks the wrong contraction.
1878+
val effectiveFrequency =
1879+
if (dictWord in contractionAliases) {
1880+
ALIAS_KEY_FLOOR_FREQUENCY + (score * 1000f).toInt()
1881+
} else {
1882+
candidateFrequency
1883+
}
1884+
if (bestCandidate == null || effectiveFrequency > bestCandidate.score) {
1885+
bestCandidate = WordCandidate(dictWord, effectiveFrequency)
18511886
}
18521887
}
18531888
}
18541889

18551890
// 5. Apply correction only if confident candidate found.
18561891
if (bestCandidate != null && bestCandidate.score >= frequencyFloor) {
1857-
val corrected = preserveCapitalization(typedWord, bestCandidate.word)
1858-
Log.d(TAG, "AUTO-CORRECT: '$typedWord' → '$corrected' (freq=${bestCandidate.score})")
1892+
// Re-route alias-keyed winners through contractionAliases so the
1893+
// returned form is the apostrophe-bearing contraction. Without
1894+
// this, `donr → dont` (the alias-key) would stop there; the
1895+
// user-visible result must be `don't`. The same I-capitalization
1896+
// rule from step 0 applies.
1897+
val winnerWord = bestCandidate.word
1898+
val aliasTarget = contractionAliases[winnerWord]
1899+
val outputWord = aliasTarget ?: winnerWord
1900+
val corrected = if (aliasTarget != null && aliasTarget.startsWith("i'")) {
1901+
aliasTarget.replaceFirstChar { it.uppercase() }
1902+
} else {
1903+
preserveCapitalization(typedWord, outputWord)
1904+
}
1905+
Log.d(TAG, "AUTO-CORRECT: '$typedWord' → '$corrected' (winner=$winnerWord, freq=${bestCandidate.score})")
18591906
return corrected
18601907
}
18611908

0 commit comments

Comments
 (0)