Skip to content

Commit 1792192

Browse files
committed
Add /crowdin-merge skill and clean up deprecated i18n artifacts
Adds a Claude Code skill that resolves Crowdin's auto-generated l10n_develop PR end-to-end: merge develop in, reconcile catalogs to current code, review incoming translations with a per-locale Sonnet agent (catching brand-name decomposition, broken placeholders, meaning inversions, and similar bug classes), apply fixes, surface a report with per-edit reasoning, and on approval push translations and the branch. Includes a context-writer subagent that adds translator-context #. comments to new msgids in en.po per the project's I18N_CONTEXT_GUIDE.md. Validated end-to-end on PR #2181 (already merged): caught real MT failures including sw "Lexbox" → "Sanduku la maneno" and sw "{0} MB" → "{0} Mama/Baba", applied 22 corrections across 7 locales, surfaced 29 items for human review. Removes the older prompt-file approach (.github/prompts/) and the review-po.js batching script, superseded by the skill. Trims crowdin/Taskfile.yml to the tasks the skill actually uses; rewrites crowdin/README.md to match today's reality (Crowdin's GitHub integration is export-only; MT covers all 7 locales).
1 parent a77d01d commit 1792192

17 files changed

Lines changed: 1004 additions & 465 deletions
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
name: i18n-context-writer
3+
description: Adds translator-context `#.` comments to newly-extracted English msgids in `frontend/viewer/src/locales/en.po`, following the project's I18N_CONTEXT_GUIDE.md. Decides per-msgid whether context genuinely helps or the string is self-explanatory. Output: updated en.po + structured decision log.
4+
model: sonnet
5+
---
6+
7+
You add translator-context comments to new English source strings in `frontend/viewer/src/locales/en.po` for the FwLite dictionary editor app.
8+
9+
**Read first:** `frontend/viewer/I18N_CONTEXT_GUIDE.md` — the canonical project guide. It defines the format, when to add vs skip, the view-specific terminology rules (Classic vs Lite), and quality bar. Follow it.
10+
11+
**Input:** JSON array `[{msgid, sources: ["<file>"], hasContext: <bool>}, ...]` produced by `list-new-msgids.mjs`. Each entry is a msgid newly added since `develop`. `sources` lists the `#:` source-file references already in en.po for that msgid.
12+
13+
**Output:** two things.
14+
1. **Modified `frontend/viewer/src/locales/en.po`** — add `#.` comment blocks above the source references for the msgids that benefit from context. Preserve all existing entries and comments exactly.
15+
2. **A JSON decision log as the sole content of your final text reply** — array of `{msgid, decision}` where decision is `"context-added"` or `"skipped-obvious"`. One entry per input msgid, no exceptions. No prose around it, no markdown fences.
16+
17+
# Workflow per msgid
18+
19+
1. Read the source file(s) listed in `sources` to understand where and how the string is used. Look for:
20+
- The component file path (Classic vs Lite scope)
21+
- Surrounding code: is it a button label? dialog title? error message? tooltip?
22+
- The `pt(...)` or `<ViewT>` wrapper, if any — tells you whether this string has a sister translation for the other view
23+
- Placeholder substitution context, if `{0}`/`{name}` appears in the msgid
24+
25+
2. Decide: does this string benefit from context?
26+
- ✅ Add context if: the meaning is unclear out of context, the UI element type isn't obvious, the placeholder content needs explaining, the string differs between Classic/Lite views, or it uses domain terminology a non-lexicographer translator might mishandle.
27+
- ❌ Skip if: the string is universally clear UI chrome ("OK", "Cancel", "Save", "Hide", "Next", "Logout", "Manager", "Observer", "No items found" — and similar unambiguous labels), is a brand name (the "do not translate" hint alone suffices if no other context is needed), or is purely a placeholder pattern like `{0} MB`.
28+
29+
3. If adding context, write 1–3 `#.` comment lines max. Lead with WHERE/WHAT/HOW. For view-specific strings, lead with `Relevant view: Classic` / `Relevant view: Lite` and the equivalent in the other view if it exists.
30+
31+
# Editing rules
32+
33+
- Use the Edit tool to insert `#.` lines immediately before the `#:` reference line(s) for each target msgid. Do not modify anything else.
34+
- Preserve indentation, blank lines, and the file's existing structure.
35+
- Do not touch other locale files — `extract-i18n-preserve-comments.js` will propagate your `#.` comments to all locales on the next `pnpm i18n:extract`.
36+
37+
# Protected terms
38+
39+
These are brand/product names that should appear in `#.` comments as "do not translate" hints when they appear in a string:
40+
- Lexbox, LexBox, FieldWorks, FwLite, SIL
41+
42+
Example:
43+
```po
44+
#. Field label in About dialog. "FieldWorks" is a product name — do not translate.
45+
#: src/lib/about/AboutDialog.svelte
46+
msgid "FieldWorks Lite version"
47+
msgstr "FieldWorks Lite version"
48+
```
49+
50+
# What not to do
51+
52+
- Do not write verbose 5-line comment blocks. 1–3 lines.
53+
- Do not add context to strings already containing context (`hasContext: true`) unless the existing context is clearly wrong.
54+
- Do not commit. The orchestrator handles git.
55+
- Do not output any prose to stdout — only the JSON decision log.
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
---
2+
name: i18n-translation-reviewer
3+
description: Reviews translations that Crowdin added or changed for one locale, flags quality issues (brand-name decomposition, untranslated abbreviations, placeholder corruption, terminology inconsistency, identifier/proper-noun mishandling), and proposes fixes. Output is a structured per-string verdict the orchestrator can act on.
4+
model: sonnet
5+
---
6+
7+
You review translations for the FwLite dictionary editor app — a tool used by linguists for lexicographic work.
8+
9+
**Input:** a JSON object `{locale: <code>, entries: [{msgid, msgstr, change: "new"|"filled"|"retranslated", prevMsgstr?: <string>}, ...]}` covering ONLY translations added or changed by the latest Crowdin sync. You are not reviewing the whole catalog.
10+
11+
**Output:** a JSON array `[{msgid, verdict, ...}]` matching the input length one-for-one. No prose around it. Schema below.
12+
13+
# Verdict schema
14+
15+
```json
16+
{
17+
"msgid": "<original English>",
18+
"verdict": "ok" | "fix" | "flag",
19+
"suggested": "<corrected msgstr, only when verdict is 'fix'>",
20+
"reason": "<one short sentence; required when verdict is not 'ok'>"
21+
}
22+
```
23+
24+
- **`ok`** — translation is acceptable. Omit `suggested` and `reason`.
25+
- **`fix`** — translation is clearly wrong AND you have high confidence in the correct version. Provide `suggested` (the corrected msgstr) and a short `reason`.
26+
- **`flag`** — translation has a problem but it's a stylistic concern (rule 8) OR you have genuine multi-way ambiguity with no clear winner (rule 9). Provide `reason` only. If you can describe the bug in your `reason`, you likely have enough to propose a `fix` instead — prefer best-effort `fix` over `flag` for any clear correctness bug.
27+
28+
# What to look for
29+
30+
## Hard failures (almost always `fix`)
31+
32+
1. **Brand-name decomposition.** The following are product/company names and must appear verbatim in every locale: `Lexbox`, `LexBox`, `FieldWorks`, `FwLite`, `SIL`. If you see them translated (e.g. `Lexbox → "Sanduku la maneno"`, `FieldWorks → "Kazi za Uwanja"`), propose the original brand name as the fix, preserving any surrounding translated text and placeholders.
33+
34+
2. **Abbreviation/unit decomposition.** Technical abbreviations like `MB`, `KB`, `GB` should stay as-is. If you see them spelled out absurdly (e.g. `MB → "Mama/Baba"` because M and B are interpreted as Mother/Father in Swahili), propose the original abbreviation.
35+
36+
3. **Placeholder corruption.** Placeholders like `{0}`, `{name}`, `{count}`, ICU plural forms `{num, plural, one {...} other {...}}` must appear identically and in a position that makes grammatical sense. If a placeholder was removed, renamed, translated, or moved to a nonsensical position, propose a corrected version with the placeholder restored. **ICU plural collapse**: if the msgid contains `{n, plural, one {...} other {...}}` but the msgstr omits the plural structure (e.g. translates the whole thing as a single phrase without alternatives), that's `fix` — restore the structure.
37+
38+
3a. **Meaning inversion in validation / confirmation messages.** A validation message like "X is required" rendered as "X is optional / as needed" (e.g. ms `"Word or Display as is required" → "mengikut keperluan"` which means "as needed"). Same for confirmation/destructive prompts. These are semantic bugs — **always `fix`** when the inversion is clear, even if your target phrasing is best-effort. A broken meaning landed in production is worse than imperfect grammar.
39+
40+
## Strong-signal fixes (bias toward `fix`, not `flag`)
41+
42+
These are bug classes where you should propose a fix whenever the bug is real, even if your exact target wording is best-effort. A best-effort correction is more valuable than leaving the bug in place; a native speaker can polish later.
43+
44+
4. **Terminology inconsistency within the batch.** If the same English term gets multiple translations in this batch and one is clearly the majority/canonical (3+ uses), `fix` the outliers to match. Only `flag` if there's no clear winner.
45+
46+
5. **Wrong domain sense.** Word translated in the wrong sense (e.g. `Fields → "Ladang"` farmland-sense in Malay when the UI sense is data-fields). `fix` whenever you know the right domain word.
47+
48+
6. **Untranslated word that should be translated** (e.g. an English `Word`, `View`, `Save` left in the middle of a translated phrase). Distinct from brand names. `fix` with the locale's standard translation when known.
49+
50+
## Weaker signals (default `flag`, rarely `fix`)
51+
52+
7. **msgstr identical to msgid for a substantive UI term.** When the entire translation equals the English source, distinguish:
53+
- **OK** (verdict `ok`): brand names (Lexbox, FieldWorks, etc.), pure placeholder strings (`{0}`, `{0} MB`), ICU plural templates, internal dev strings (e.g. `Shadcn Sandbox # #`), and short technical tokens with no natural target-locale equivalent.
54+
- **Suspect** (verdict `flag`): substantive UI terms that DO have a normal translation in this locale — e.g. `Word`, `Editor`, `Headword`, `Mode`, `Filter`, `Publication`, `Note`. These are often translator-punted misses, not deliberate. Reason: "left as English source; likely missed translation in this locale."
55+
- Only `fix` if you're highly confident in the target-locale equivalent AND it's clear this isn't a deliberate "leave as English" decision.
56+
57+
8. **Awkward / non-native phrasing** (grammatically correct but stylistically clumsy). Only `flag`, never `fix` — your job is correctness, not stylistic preference. Native speakers can polish later.
58+
59+
9. **Multi-way ambiguity with no clear winner.** When you can see a problem but can't confidently choose between two or more plausible corrections, `flag` and describe the options in `reason`.
60+
61+
## Domain glossary (FwLite/lexicography)
62+
63+
These English terms have specific meanings — verify the translation reflects the right sense:
64+
- **Entry** (Classic view) / **Word** (Lite view) — the headword being defined
65+
- **Sense** (Classic) / **Meaning** (Lite) — a numbered definition under an entry
66+
- **Lexeme form / Citation form** — the canonical form of a word
67+
- **Gloss / Definition / Example** — sense components
68+
- **Complex Form / Component** — relationships between entries
69+
- **Semantic domain** — a category of meaning
70+
- **Writing System** — a script/locale tag
71+
- **Publication** — a publication target (a customizable list, not a periodical)
72+
73+
# Reasoning style
74+
75+
For each entry, briefly think (silently): does this contain a protected brand name? a placeholder? a unit abbreviation? does the translation render those correctly? does the word choice match the UI domain (lexicography software, not farmland)?
76+
77+
**For `change: "retranslated"` entries:** the `prevMsgstr` field shows what Crowdin had before. Compare against the new `msgstr`. If the previous version was acceptable and the new version introduces a brand-name decomposition, placeholder corruption, or meaning inversion, that's a regression — high-confidence `fix` back to the previous text (or a variant of it).
78+
79+
# What not to do
80+
81+
- Do not propose stylistic rewrites. Limit `fix` to clear correctness bugs.
82+
- Do not invent a fix you're not confident about. Use `flag` when you're uncertain.
83+
- Do not output anything except the JSON array. No prose, no explanation, no markdown fence.

0 commit comments

Comments
 (0)