Skip to content

Commit f2342f3

Browse files
committed
Improve handling of singular strings in plural-form batches
When batches contain both singular and plural entries, the AI sometimes returns `<f0>...<fN>` form tags for singular strings (e.g., "30 files"). The parser accepted all forms, and the post-validator blanked them for placeholder mismatches. - Clarify prompt to show both formats when batch has mixed entries - Extract only f0 when AI returns form tags for singular entries - Initialize singular entries with 1 msgstr slot instead of pluralCount
1 parent 19fa627 commit f2342f3

5 files changed

Lines changed: 61 additions & 16 deletions

File tree

config/prompt.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,21 @@ You are a professional translator specializing in software localization. Your ta
33
### Instructions
44

55
1. **Translate each XML tag** and respond with the translated content in the same XML structure.
6-
2. **CRITICAL — Preserve placeholders exactly**: The translation MUST contain the **exact same placeholders** as the source (%s, %d, %1$s, etc.) — no more, no fewer. If none in source, none in translation. Use ASCII %, never .
6+
2. **CRITICAL — Preserve placeholders exactly**: The translation MUST contain the **exact same placeholders** as the source (%s, %d, %1$s, etc.) — no more, no fewer. If none in source, none in translation. Do NOT add %d or any placeholder that is not in the source. Always use standard ASCII percent sign (%), never the fullwidth % (U+FF05).
77
3. **Preserve bracket placeholders**: [example], [product], {value} — keep as-is. Do NOT convert to %s/%1$s.
88
4. **Preserve bracket tags**: [strong], [/strong], [link], [/link] — keep as-is. Do NOT convert to HTML.
99
5. **Preserve formatting**: Keep line breaks, HTML tags, trailing whitespace unchanged.
1010
6. **Context/comments**: Use provided context and comment attributes.
11-
7. **Plural forms**: Provide exactly {{PLURAL_COUNT}} DISTINCT translations (f0, f1, ...). Rules:
12-
- Arabic(6): f0=zero(plural), f1=one(singular,drop %d), f2=two(dual تان/تين,drop %d), f3=few(plural), f4=many(singular+%d), f5=other(singular+%d)
13-
- Russian/Polish(3): f0=singular, f1=paucal(2-4), f2=plural(5+)
14-
- French(2): f0=0+1, f1=2+
15-
- Japanese/Chinese/Korean(1): f0 only, MUST keep %d
11+
7. **Singular vs. plural entries**:
12+
- **Singular entries** (no `<singular>`/`<plural>` tags): Provide ONE translation. Do NOT use `<f0>`, `<f1>`, etc. — just `<t i="N">translation</t>`. Translate the string literally, even if it contains a number (e.g., "30 files" → translate as-is, do NOT split into plural forms).
13+
- **Plural entries** (with `<singular>` and `<plural>` tags): Provide exactly {{PLURAL_COUNT}} DISTINCT translations using `<f0>`, `<f1>`, etc. Only use placeholders that exist in the source — if the source uses [number] instead of %d, use [number] in ALL forms, never add %d. Rules:
14+
- Arabic(6): f0=zero(plural), f1=one(singular, may drop %d if source has it), f2=two(dual تان/تين, may drop %d if source has it), f3=few(plural), f4=many(plural), f5=other(plural)
15+
- Russian/Polish(3): f0=singular, f1=paucal(2-4), f2=plural(5+)
16+
- French(2): f0=0+1, f1=2+
17+
- Japanese/Chinese/Korean(1): f0 only, MUST keep all placeholders
1618

1719
### Output Format
1820

19-
```xml
20-
<t i="N">translation</t>
21-
```
21+
Singular: `<t i="N">translation</t>`
2222

2323
Plural: `<t i="N"><f0>...</f0><f1>...</f1>...</t>`

src/utils/xmlTranslation.js

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,11 +75,12 @@ export function buildXmlPrompt(batch, targetLang, pluralCount, dictionaryMatches
7575
prompt += xmlTags + '\n\nRespond:\n';
7676

7777
if (hasPlurals) {
78-
prompt += `For entries with <singular> and <plural> tags, provide ${pluralCount} translations:\n\n`;
79-
8078
const formTags = Array.from({ length: pluralCount }, (_, i) => `<f${i}>translation for form ${i}</f${i}>`).join('');
8179

82-
prompt += `Format: <t i="N">${formTags}</t>\n`;
80+
prompt += `For entries with <singular> and <plural> tags, provide ${pluralCount} plural forms:\n`;
81+
prompt += `Format: <t i="N">${formTags}</t>\n\n`;
82+
prompt += `For all other entries (no <singular>/<plural>), provide a single translation:\n`;
83+
prompt += `Format: <t i="N">translation</t>\n`;
8384
} else {
8485
prompt += `Format: <t i="N">translation</t>\n`;
8586
}
@@ -200,7 +201,7 @@ function validatePluralForms(forms, originalMsgid, expectedCount, itemId, logger
200201
export function parseXmlResponse(xmlResponse, batch, pluralCount, logger, dictionaryCount = 0, verbosityLevel = 1) {
201202
const result = batch.map((entry) => ({
202203
msgid: entry.msgid,
203-
msgstr: Array(pluralCount).fill(''),
204+
msgstr: Array(entry.msgid_plural ? pluralCount : 1).fill(''),
204205
}));
205206

206207
const validationStats = createEmptyValidationStats();
@@ -249,8 +250,9 @@ export function parseXmlResponse(xmlResponse, batch, pluralCount, logger, dictio
249250
}
250251

251252
const hasFormTags = block.includes('<f0>');
253+
const isSingularEntry = !batch[batchIndex].msgid_plural;
252254

253-
if (hasFormTags) {
255+
if (hasFormTags && !isSingularEntry) {
254256
const forms = [];
255257

256258
for (let i = 0; i < pluralCount; i++) {
@@ -278,6 +280,17 @@ export function parseXmlResponse(xmlResponse, batch, pluralCount, logger, dictio
278280
}
279281

280282
result[batchIndex].msgstr = correctedForms;
283+
} else if (hasFormTags && isSingularEntry) {
284+
// AI incorrectly returned plural forms for a singular entry.
285+
// Extract only form 0 and discard the rest.
286+
const formRegex = /<f0>(.*?)<\/f0>/s;
287+
const formMatch = block.match(formRegex);
288+
289+
if (formMatch) {
290+
const translation = normalizeNbsp(decodeXmlEntities(formMatch[1]), batch[batchIndex].msgid);
291+
292+
result[batchIndex].msgstr = [translation];
293+
}
281294
} else {
282295
const contentMatch = block.match(/<t[^>]*>(.*?)<\/t>/s);
283296

tests/integration/openai-with-dictionary.test.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,7 @@ describe('OpenAI Provider with Dictionary Integration', () => {
223223
expect(promptMessage).toContain('<source i="1" placeholders="none">item</source>');
224224
expect(promptMessage).toContain('<singular>One item</singular>');
225225
expect(promptMessage).toContain('<plural>%d items</plural>');
226-
expect(promptMessage).toContain('For entries with <singular> and <plural> tags, provide 2 translations');
226+
expect(promptMessage).toContain('For entries with <singular> and <plural> tags, provide 2 plural forms');
227227
});
228228

229229
it('should handle API errors gracefully with dictionary', async () => {

tests/unit/placeholderValidation.test.js

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -539,4 +539,36 @@ describe('placeholder validation', () => {
539539
expect(translations[0].msgstr[0]).toBe('لا عناصر');
540540
});
541541
});
542+
543+
describe('singular entries with form tags', () => {
544+
it('should extract only f0 when AI returns plural forms for a singular entry', () => {
545+
// AI incorrectly returns <f0>...<f5> for a singular string like "30 files".
546+
const batch = [{ msgid: '30 files' }];
547+
const forms = ['30 ملفًا', '30 ملف', '30 ملفان', '%d ملفات', '%d ملفًا', '%d ملف'];
548+
const { translations } = parseXmlResponse(xml([forms]), batch, 6, mockLogger);
549+
550+
expect(translations[0].msgstr).toHaveLength(1);
551+
expect(translations[0].msgstr[0]).toBe('30 ملفًا');
552+
});
553+
554+
it('should initialize singular entries with 1 msgstr slot', () => {
555+
const batch = [{ msgid: 'Hello' }, { msgid: 'One item', msgid_plural: '%d items' }];
556+
const { translations } = parseXmlResponse('', batch, 6, mockLogger);
557+
558+
expect(translations[0].msgstr).toHaveLength(1);
559+
expect(translations[1].msgstr).toHaveLength(6);
560+
});
561+
562+
it('should not trigger placeholder warnings for discarded plural forms', () => {
563+
const { logger, warnings } = createSpyLogger();
564+
const batch = [{ msgid: '30 files' }];
565+
// AI adds %d in forms 3-5 — these should be discarded, not warned about.
566+
const forms = ['30 ملفًا', '30 ملف', '30 ملفان', '%d ملفات', '%d ملفًا', '%d ملف'];
567+
parseXmlResponse(xml([forms]), batch, 6, logger);
568+
569+
const placeholderWarnings = warnings.filter((w) => w.includes('Placeholder mismatch'));
570+
571+
expect(placeholderWarnings).toHaveLength(0);
572+
});
573+
});
542574
});

tests/unit/xmlTranslationWithDictionary.test.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ describe('XML Translation with Dictionary', () => {
5959
expect(result.xmlPrompt).toContain('<source i="1" placeholders="none">item</source>');
6060
expect(result.xmlPrompt).toContain('<singular>One item</singular>');
6161
expect(result.xmlPrompt).toContain('<plural>%d items</plural>');
62-
expect(result.xmlPrompt).toContain('For entries with <singular> and <plural> tags, provide 2 translations');
62+
expect(result.xmlPrompt).toContain('For entries with <singular> and <plural> tags, provide 2 plural forms');
6363
});
6464

6565
it('should escape XML in dictionary terms', () => {

0 commit comments

Comments
 (0)