You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The gloss examples rendered "1SG.NOM" as "1SGNOM" because the default
tokenSplitChars (".-|") consumes the period as a boundary and does not
draw it. Leipzig glosses pack features into one morpheme with periods
(go.PST.IPFV = one token), so the period must be preserved.
- Add settings.tokenSplitChars and settings.tokenMergeChar to the API
- Rework gloss examples to the verified interface config: gloss on top,
source middle, translation bottom; tokenSplitChars "-|"; gloss-source
pair arcs hidden with 12px gap; source-translation arcs shown
- Replace broken German 4-line example with a clean 4-tier French stack
(gloss / IPA / source / translation)
- Document the "split char is not rendered" gotcha on /api
- Update OpenAPI schema, SKILL.md, references/api.md
- 3 new tests for tokenization settings (62 total)
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
description: 'Tint word tokens in the color of their connection. Default: true.'
208
+
},
209
+
tokenSplitChars: {
210
+
type: 'string',
211
+
description:
212
+
'Characters (besides whitespace) that split text into separate word tokens. Default: ".-|". For Leipzig glosses set "-|" so periods stay inside a token (e.g. "go.PST.IPFV" is one token). The split character itself is not rendered.'
213
+
},
214
+
tokenMergeChar: {
215
+
type: 'string',
216
+
maxLength: 1,
217
+
description:
218
+
'Single character that joins parts into one token while rendering as a space (e.g. "is+playing" → "is playing", one word). Default: "+".'
Copy file name to clipboardExpand all lines: word-aligner-skill/SKILL.md
+15-9Lines changed: 15 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,30 +67,36 @@ If uncertain about tokenization, call `GET https://aligner.tinygods.dev/api/alig
67
67
}
68
68
```
69
69
70
-
**Interlinear gloss**: place the gloss line directly under the source (adjacent), connect source→gloss tokens, hide the arcs with `showConnectors: false`, use a small gap (12px). Put the free translation below the gloss with a larger gap and no connectors.
70
+
**Interlinear (Leipzig) gloss** — three lines: gloss on top, source in the middle, free translation at the bottom.
71
71
72
-
`"1SG.NOM PST.IPFV"` — dots are default split chars, so this yields 4 tokens: `1SG`[0]`NOM`[1]`PST`[2]`IPFV`[3].
72
+
Important: a Leipzig gloss uses periods to pack grammatical features into one morpheme (`go.PST.IPFV` = "go" + past + imperfective, **one** token). The default `tokenSplitChars` is `".-|"`, which would split on the period and hide it — rendering `goPSTIPFV`. To keep the periods, set `"tokenSplitChars": "-|"` (drop the dot).
73
+
74
+
Layout rules:
75
+
- Gloss sits directly above the source: hide its connector arcs (`showConnectors: false`) and use a tight gap (`gapPx: 12`). The gloss tokens still get colors via their connections to the source.
76
+
- The source→translation pair keeps its arcs (omit it from `pairs`).
77
+
- Connect each gloss token to its source word, each source word to its translation word(s).
Gloss tokens inherit colors from their source-word group. Arcs are hidden on both pairs; only the color coding is visible.
98
+
Line 0 (gloss) has 2 whitespace-separated tokens: `1SG.NOM`[0] and `go.PST.IPFV`[1]. "ходил" maps to "have been going" (one-to-many, shared color); the gloss above it is color-matched but arc-free.
93
99
94
100
## Full parameter reference
95
101
96
-
See [references/api.md](references/api.md) for the complete parameter tables: `LineInput`, `SettingsInput` (palette, lineStyle, lineThickness, lineOpacity, background, theme, showNumbers, colorTokensByLink), and `PairInput` (gapPx, showConnectors).
102
+
See [references/api.md](references/api.md) for the complete parameter tables: `LineInput`, `SettingsInput` (palette, lineStyle, lineThickness, lineOpacity, background, theme, showNumbers, colorTokensByLink, tokenSplitChars, tokenMergeChar), and `PairInput` (gapPx, showConnectors).
|`showNumbers`| boolean |`false`| Show line numbers next to each line |
52
52
|`colorTokensByLink`| boolean |`true`| Tint word tokens in the color of their connection |
53
+
|`tokenSplitChars`| string |`.-\|`| Characters (besides whitespace) that split text into tokens. The split char is **not** rendered. Set to `-\|` to keep periods inside Leipzig gloss morphemes (`go.PST.IPFV` = one token) |
54
+
|`tokenMergeChar`| string (1 char) |`+`| Joins parts into one token while rendering as a space, e.g. `is+playing` → `is playing` (one word) |
0 commit comments