Commit 7ff0c5f
committed
feat(autocorrect): Tier B layout-aware adjacency + accents + freq fix
Three coordinated changes to the adjacency-aware autocorrect system,
plus the fix to the freq-downgrade bug surfaced by Phase A
investigation.
## KeyAdjacency — Tier B + accented coverage
- Replaces hardcoded `MAX_DISTANCE = q↔m = 7.28` (which was
mathematically wrong — true max is `q↔p = 9.0`, same-row opposite
ends) with pairwise `computeMaxDistance` that adapts to the active
position table.
- Adds `setLayout(positions)` / `resetLayout()` so non-QWERTY layouts
(AZERTY, QWERTZ, Dvorak, Colemak, user-custom) get adjacency-aware
autocorrect using their REAL physical key positions. `@Volatile`
snapshot pattern (`val p = positions` at the top of `keyDistance`)
is the lightweight thread-safety primitive — `Keyboard2View.onLayout`
pushes from the UI thread, autocorrect reads from the prediction
thread.
- Adds common accented Latin chars (`á à â ä ã å é è ê ë í ì î ï
ó ò ô ö õ ø ú ù û ü ñ ç ß ý ÿ`) to the default position table
mapped to their unaccented base's position. de/fr/es/it/pt/sv users
now get adjacency credit on accent typos (`cafe ↔ café` cost ~0).
When the user IS on a layout with dedicated accent keys (German ü/ö/ä,
French é/è/à), `setLayout` overrides with the real key positions.
## Keyboard2View.onLayout integration
Already exposed `getRealKeyPositions(): Map<Char, PointF>` — never
called by anything. Wires it up: after layout finalizes,
`KeyAdjacency.setLayout(realPositions)` so the active layout drives
autocorrect's distance model. Defensive try/catch so a position-table
exception can't kill IME boot.
## WordPredictor selection logic — hybrid score/freq tiebreaker
Replaces the previous "alias-key floor frequency = Int.MAX_VALUE / 2"
mechanism (which overrode clearly-better non-alias matches) with a
four-tier comparator on a new `AutocorrectCandidate(word, score, freq)`:
1. Big score gap (> 0.10) → score wins. Stops `wuestion → within`
(lenDiff=2, score 0.69 vs `question` 0.99) and `tge → weve`
(alias-keyed but score 0.66 vs `the` 0.96).
2. Alias vs alias → raw score (structural closeness) wins. `hadnr`
means `hadnt` not `hasnt` because hadnt is a single adj sub
(0.978) vs hasnt's two subs (0.956), regardless of either's
dict frequency.
3. Close score (within gap), non-alias → freq wins. Stops
`tfe → tfw` (tfw scores 0.96 by 1 adj sub, but `the` at 0.93
freq-beats it 255 vs 162). Same fix lands `questin → question`
over `quentin`, and `quuestion → question` over `quotation`.
4. Everything tied → deterministic by raw score (no hash-map
iteration-order races).
Alias-keys get a +0.15 score bonus so they clear the 0.10 gap on
otherwise-tied candidates (`donr → don't` wins by score, not
hash-map order). `WordCandidate(word, score: Int)` is kept distinct
for the prediction-path use (`predictWords`), which still uses Int
unified scores from `calculateUnifiedScore` — the Float score in
`AutocorrectCandidate` is the [0,1] match quality from KeyAdjacency,
not the same scale at all.
## Freq-downgrade fix in alias-injection
`loadPrimaryContractionKeys` (line 1024) and `loadContractionKeysIntoMaps`
(line 956) previously did:
currentDict[withoutApostrophe] = currentDict[withApostrophe] ?: 5000
This was destructive: the apostrophe form is never in the dict
(en_enhanced.json contains no apostrophe entries), so the `?:` ALWAYS
fell through to 5000 — silently downgrading `hadnt` from its binary-
loaded ≈789K freq to 5000. Fixed to preserve any existing freq:
currentDict[withoutApostrophe] = currentDict[withApostrophe]
?: currentDict[withoutApostrophe]
?: 5000
Phase A investigation confirmed safe for beam search: OptimizedVocabulary
normalizes freqs to [0,1] then multiplies by `Config.neural_frequency_weight`
(user-tunable, default 0.57), so higher input freq → slightly better
ranking but no breakage. Beam search reads through `OptimizedVocabulary.WordInfo`
not raw dict, so no scale-mixing risk.
## Tests
`KeyAdjacencyTest` (+11 cases, total 31):
- 7 accent tests covering é/ñ/ç/ü/ß/ý/ä.
- 4 layout-injection tests: replaces-default, AZERTY q/a swap,
empty-map reverts, case-insensitive normalization.
- Updates to 5 pre-existing tests for the corrected MAX = 9.0.
Verification: 1242 pure + 194 mock + 1279 instrumented all green.
— opus 4.71 parent 39b9b52 commit 7ff0c5f
4 files changed
Lines changed: 409 additions & 80 deletions
File tree
- src
- main/kotlin/tribixbite/cleverkeys
- autocorrect
- test/kotlin/tribixbite/cleverkeys/autocorrect
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1289 | 1289 | | |
1290 | 1290 | | |
1291 | 1291 | | |
| 1292 | + | |
| 1293 | + | |
| 1294 | + | |
| 1295 | + | |
| 1296 | + | |
| 1297 | + | |
| 1298 | + | |
| 1299 | + | |
| 1300 | + | |
| 1301 | + | |
| 1302 | + | |
| 1303 | + | |
| 1304 | + | |
| 1305 | + | |
1292 | 1306 | | |
1293 | 1307 | | |
1294 | 1308 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
75 | | - | |
76 | | - | |
77 | | - | |
78 | | - | |
79 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
80 | 90 | | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
89 | 100 | | |
90 | | - | |
91 | | - | |
92 | | - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
93 | 107 | | |
94 | | - | |
| 108 | + | |
95 | 109 | | |
96 | 110 | | |
97 | 111 | | |
| |||
951 | 965 | | |
952 | 966 | | |
953 | 967 | | |
954 | | - | |
| 968 | + | |
| 969 | + | |
| 970 | + | |
| 971 | + | |
955 | 972 | | |
956 | | - | |
| 973 | + | |
| 974 | + | |
| 975 | + | |
957 | 976 | | |
958 | 977 | | |
959 | 978 | | |
| |||
1029 | 1048 | | |
1030 | 1049 | | |
1031 | 1050 | | |
1032 | | - | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
1033 | 1065 | | |
1034 | | - | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
1035 | 1069 | | |
1036 | 1070 | | |
1037 | 1071 | | |
| |||
1791 | 1825 | | |
1792 | 1826 | | |
1793 | 1827 | | |
1794 | | - | |
| 1828 | + | |
1795 | 1829 | | |
1796 | 1830 | | |
1797 | 1831 | | |
| |||
1867 | 1901 | | |
1868 | 1902 | | |
1869 | 1903 | | |
1870 | | - | |
1871 | | - | |
1872 | | - | |
1873 | | - | |
1874 | | - | |
1875 | | - | |
1876 | | - | |
1877 | | - | |
1878 | | - | |
1879 | | - | |
1880 | | - | |
1881 | | - | |
1882 | | - | |
1883 | | - | |
1884 | | - | |
1885 | | - | |
| 1904 | + | |
| 1905 | + | |
| 1906 | + | |
| 1907 | + | |
| 1908 | + | |
| 1909 | + | |
| 1910 | + | |
| 1911 | + | |
| 1912 | + | |
| 1913 | + | |
| 1914 | + | |
| 1915 | + | |
| 1916 | + | |
| 1917 | + | |
| 1918 | + | |
| 1919 | + | |
| 1920 | + | |
| 1921 | + | |
| 1922 | + | |
| 1923 | + | |
| 1924 | + | |
| 1925 | + | |
| 1926 | + | |
| 1927 | + | |
| 1928 | + | |
| 1929 | + | |
| 1930 | + | |
| 1931 | + | |
| 1932 | + | |
| 1933 | + | |
| 1934 | + | |
| 1935 | + | |
| 1936 | + | |
| 1937 | + | |
| 1938 | + | |
| 1939 | + | |
| 1940 | + | |
| 1941 | + | |
| 1942 | + | |
| 1943 | + | |
1886 | 1944 | | |
1887 | 1945 | | |
1888 | 1946 | | |
1889 | 1947 | | |
1890 | 1948 | | |
1891 | | - | |
| 1949 | + | |
1892 | 1950 | | |
1893 | 1951 | | |
1894 | 1952 | | |
| |||
1902 | 1960 | | |
1903 | 1961 | | |
1904 | 1962 | | |
1905 | | - | |
| 1963 | + | |
| 1964 | + | |
| 1965 | + | |
1906 | 1966 | | |
1907 | 1967 | | |
1908 | 1968 | | |
| |||
1913 | 1973 | | |
1914 | 1974 | | |
1915 | 1975 | | |
1916 | | - | |
1917 | | - | |
| 1976 | + | |
| 1977 | + | |
1918 | 1978 | | |
1919 | 1979 | | |
1920 | 1980 | | |
| |||
1959 | 2019 | | |
1960 | 2020 | | |
1961 | 2021 | | |
1962 | | - | |
| 2022 | + | |
| 2023 | + | |
| 2024 | + | |
1963 | 2025 | | |
1964 | 2026 | | |
1965 | 2027 | | |
| 2028 | + | |
| 2029 | + | |
| 2030 | + | |
| 2031 | + | |
| 2032 | + | |
| 2033 | + | |
| 2034 | + | |
| 2035 | + | |
| 2036 | + | |
| 2037 | + | |
| 2038 | + | |
| 2039 | + | |
| 2040 | + | |
| 2041 | + | |
| 2042 | + | |
| 2043 | + | |
| 2044 | + | |
| 2045 | + | |
1966 | 2046 | | |
1967 | 2047 | | |
1968 | 2048 | | |
| |||
0 commit comments