You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Fix two remaining "article bug" examples (world → le + monde, not just monde):
SKILL.md minimal request and references custom-style example
- Replace minimal request with an unambiguous 1:1 example (I sleep / Я сплю)
- Rewrite "Word index counting" to spell out the nuances an unfamiliar
agent needs: split char is consumed/not rendered, tokenSplitChars is
configurable and shifts indices, punctuation stays attached, merge char,
RTL reading-order indexing
- Add Tokenization, Constraints & errors sections to references/api.md
- Clarify GET is lines-only (ignores alignments/settings/pairs)
All 10 examples and the punctuation/merge claims verified against the live API.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: word-aligner-skill/SKILL.md
+14-6Lines changed: 14 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,12 +14,12 @@ Word Aligner generates shareable interactive diagrams showing which words in one
14
14
15
15
```json
16
16
{
17
-
"lines": ["Hello world", "Bonjour le monde"],
18
-
"alignments": [[0, 0, 1, 0], [0, 1, 1, 2]]
17
+
"lines": ["I sleep", "Я сплю"],
18
+
"alignments": [[0, 0, 1, 0], [0, 1, 1, 1]]
19
19
}
20
20
```
21
21
22
-
`alignments` entries are `[lineA, wordA, lineB, wordB]` — 0-based indices, lines must be adjacent.
22
+
`alignments` entries are `[lineA, wordA, lineB, wordB]`: word `wordA` of line `lineA` links to word `wordB` of line `lineB`. All indices are 0-based, and the two lines must be vertically adjacent (`|lineA − lineB| = 1`).
23
23
24
24
## Workflow
25
25
@@ -28,11 +28,19 @@ Word Aligner generates shareable interactive diagrams showing which words in one
28
28
3. Call the API.
29
29
4. Return the `url` to the user with a brief explanation.
30
30
31
-
## Word index counting
31
+
## Word index counting (read carefully — this is the #1 source of mistakes)
32
+
33
+
Word indices are token positions, so you must tokenize a line exactly the way the service does before assigning indices:
34
+
35
+
1.**Whitespace always splits.**`"I have been going"` → `I`[0]`have`[1]`been`[2]`going`[3].
36
+
2.**The `tokenSplitChars` characters also split, and are then removed from the output.** The default set is `.-|`. So `"go.PST.IPFV"` becomes three *separate* tokens `go``PST``IPFV` and **the dots disappear from the rendered diagram**. This is usually not what you want for Leipzig glosses — see the gloss pattern below, which sets `tokenSplitChars` to `"-|"` to keep the dots.
37
+
3.**Punctuation stays attached by default.**`"Hello, world!"` → `Hello,`[0]`world!`[1] (the comma and exclamation mark are part of the tokens, not separate).
38
+
4.**The merge char `+` joins parts into one token** rendered with a space: `"is+playing"` is a single token (index counts as one word) that displays as `is playing`.
39
+
5.**RTL lines:** word 0 is the logically first word (the rightmost one on screen for Hebrew/Arabic). Index in reading order, not visual order.
32
40
33
-
Count left to right from 0, splitting on whitespace. Characters `.``-``|` also split. For RTL lines, word 0 is the logically first word (rightmost on screen).
41
+
Whenever you set `tokenSplitChars` in `settings`, recount every line's indices using that same split set — changing it shifts all the indices on every line.
34
42
35
-
If uncertain about tokenization, call `GET https://aligner.tinygods.dev/api/align?lines=your+text` first and open the URL to count word boxes in the editor.
43
+
If unsure, call `GET https://aligner.tinygods.dev/api/align?lines=your+text` first and open the URL to count the word boxes in the editor.
Copy file name to clipboardExpand all lines: word-aligner-skill/references/api.md
+36-5Lines changed: 36 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,18 +77,49 @@ Controls for a specific adjacent line pair. `lower` must equal `upper + 1`.
77
77
78
78
Each tuple is `[lineA, wordA, lineB, wordB]`:
79
79
- All indices are **0-based**
80
-
-`lineA` and `lineB` must be **adjacent**: `|lineA − lineB| = 1`
81
-
- Multiple tuples sharing the same word form a **color group** automatically
80
+
-`lineA` and `lineB` must be **adjacent**: `|lineA − lineB| = 1` (you cannot connect across a line; stack intermediate tiers instead)
81
+
- Multiple tuples sharing the same word form a **color group** automatically (one-to-many, many-to-one, and many-to-many all work)
82
82
83
83
---
84
84
85
-
## GET /api/align (simple, no alignments)
85
+
## Constraints and errors
86
+
87
+
-**1–8 lines.** Fewer than 1 or more than 8 is rejected.
88
+
-**Adjacency:** alignment lines must differ by exactly 1; `pairs` require `lower = upper + 1`.
89
+
-**Index ranges:** line and word indices must be in range for the (tokenized) text.
90
+
- Numeric settings are clamped, not rejected: `lineThickness`→1–8, `lineOpacity`→0.2–1, `sizePx`→12–64, line `gapPx`→0–56, pair `gapPx`→12–156.
91
+
92
+
On invalid input the API returns **HTTP 400** with a JSON body:
93
+
94
+
```json
95
+
{ "error": "alignments[0]: word 4 out of range for line 0 (\"1SG.NOM go.PST.IPFV\" has 2 word(s))" }
96
+
```
97
+
98
+
The error message names the offending field, index, and the tokenized word count — read it to fix indices.
99
+
100
+
---
101
+
102
+
## Tokenization and word indices
103
+
104
+
Word indices in `alignments` and `pairs` refer to **token positions**, so tokenize each line the way the service does before counting:
105
+
106
+
-**Whitespace always splits.**
107
+
-**`tokenSplitChars` (default `.-|`) also splits, and the split character is removed from the rendered output.**`"go.PST.IPFV"` → three tokens `go``PST``IPFV` with the dots gone. Override `tokenSplitChars` (e.g. to `"-|"`) to keep characters you want displayed.
108
+
-**Punctuation stays attached by default** (the API does not split punctuation). `"Hello, world!"` → `Hello,`[0]`world!`[1].
109
+
-**The merge char `+` (default) joins parts into one token** displayed with a space: `"is+playing"` is one token rendered `is playing`.
110
+
-**RTL lines are indexed in reading order** — word 0 is the logically first word (rightmost on screen).
111
+
112
+
Changing `tokenSplitChars` shifts every line's indices — recount after setting it.
113
+
114
+
---
115
+
116
+
## GET /api/align (simple, lines only)
86
117
87
118
```
88
119
GET /api/align?lines=Hello+world&lines=Bonjour+le+monde
89
120
```
90
121
91
-
Returns the same `{ "url": "..." }` response. Useful for opening the editor pre-filled with text, without pre-drawn links. Helpful for verifying word tokenization.
122
+
Returns the same `{ "url": "..." }` response. **Lines only** — this endpoint ignores `alignments`, `settings`, and `pairs`; use POST for those. Useful for opening the editor pre-filled with text (no pre-drawn links) and for verifying how a line tokenizes: open the URL and count the word boxes.
92
123
93
124
---
94
125
@@ -107,7 +138,7 @@ Returns the same `{ "url": "..." }` response. Useful for opening the editor pre-
0 commit comments