Skip to content

Commit 177713c

Browse files
authored
Merge pull request #46 from jakeboone02/i18n
Internationalization support
2 parents 8fe15d3 + 693a82b commit 177713c

7 files changed

Lines changed: 1098 additions & 44 deletions

File tree

CHANGELOG.md

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10-
N/A
10+
### Added
11+
12+
- [#46] `parseIngredient` now accepts `Array<string>` as well as `string`. Each element of the array is treated as a single ingredient line.
13+
- [#46] Internationalization (i18n) support for parsing keywords
14+
- `groupHeaderPatterns`: customizable words/patterns for group headers
15+
- `rangeSeparators`: customizable range separator words/patterns (e.g., "bis", "oder", "à")
16+
- `descriptionStripPrefixes`: customizable prefix words/patterns to strip from descriptions
17+
- `trailingQuantityContext`: customizable context words indicating trailing quantities
18+
- [#46] `includeMeta` option to include source metadata (`sourceText` and `sourceIndex`) on each parsed ingredient
19+
- [#46] Deprecated legacy exports (`fors`, `forsRegEx`, `rangeSeparatorWords`, `rangeSeparatorRegEx`, `ofs`, `ofRegEx`, `froms`, `fromRegEx`) in favor of new configurable defaults and regex builders
1120

1221
## [v2.0.1] - 2026-01-30
1322

@@ -209,6 +218,7 @@ N/A
209218
[#34]: https://github.com/jakeboone02/parse-ingredient/pull/34
210219
[#37]: https://github.com/jakeboone02/parse-ingredient/pull/37
211220
[#40]: https://github.com/jakeboone02/parse-ingredient/pull/40
221+
[#46]: https://github.com/jakeboone02/parse-ingredient/pull/46
212222

213223
<!-- Release comparison links -->
214224

README.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,8 @@ In the browser, all exports including the `parseIngredient` function are availab
8080

8181
## Usage
8282

83+
The `parseIngredient` function accepts a string (with newline-separated ingredients) or an array of strings (one ingredient per element).
84+
8385
```js
8486
import { parseIngredient } from 'parse-ingredient';
8587

@@ -223,6 +225,41 @@ parseIngredient('2 large eggs', { ignoreUOMs: ['large'] });
223225
// ]
224226
```
225227

228+
### `includeMeta`
229+
230+
When `true`, each ingredient object will include a `meta` property containing source metadata:
231+
232+
- `sourceText`: The original text of the ingredient line before parsing.
233+
- `sourceIndex`: The zero-based line number in the original input (accounts for empty lines).
234+
235+
```js
236+
parseIngredient('1 cup flour\n\n2 tbsp sugar', { includeMeta: true });
237+
// [
238+
// {
239+
// quantity: 1,
240+
// quantity2: null,
241+
// unitOfMeasure: 'cup',
242+
// unitOfMeasureID: 'cup',
243+
// description: 'flour',
244+
// isGroupHeader: false,
245+
// meta: { sourceText: '1 cup flour', sourceIndex: 0 },
246+
// },
247+
// {
248+
// quantity: 2,
249+
// quantity2: null,
250+
// unitOfMeasure: 'tbsp',
251+
// unitOfMeasureID: 'tablespoon',
252+
// description: 'sugar',
253+
// isGroupHeader: false,
254+
// meta: { sourceText: '2 tbsp sugar', sourceIndex: 2 },
255+
// },
256+
// ]
257+
```
258+
259+
## Internationalization (i18n)
260+
261+
The library supports parsing ingredients in multiple languages through configurable keyword options. While unit names can be localized using `additionalUOMs`, the following options allow localization of parsing keywords and quantities.
262+
226263
### `decimalSeparator`
227264

228265
The character used as a decimal separator in numeric quantities. Use `','` for European-style decimal commas (e.g., `'1,5'` for 1.5). Defaults to `'.'`.
@@ -241,6 +278,102 @@ parseIngredient('1,5 cups sugar', { decimalSeparator: ',' });
241278
// ]
242279
```
243280

281+
### `groupHeaderPatterns`
282+
283+
Patterns to identify group headers (e.g., "For the icing:"). Strings are treated as prefix patterns matched at the start of the line followed by whitespace. RegExp patterns are used as-is for more complex matching. Defaults to `['For']`.
284+
285+
```js
286+
// German group headers
287+
parseIngredient('Für den Teig:\n2 cups flour', {
288+
groupHeaderPatterns: ['For', 'Für'],
289+
});
290+
// [
291+
// { description: 'Für den Teig:', isGroupHeader: true, ... },
292+
// { quantity: 2, unitOfMeasure: 'cups', description: 'flour', ... }
293+
// ]
294+
295+
// French with regex pattern (matches "Pour la", "Pour le", "Pour un", etc.)
296+
parseIngredient('Pour la pâte:', {
297+
groupHeaderPatterns: ['For', /^Pour\s/iu],
298+
});
299+
```
300+
301+
### `rangeSeparators`
302+
303+
Words or patterns to identify ranges between quantities (e.g., "1 to 2", "1 or 2"). Dash characters (-, –, —) are always recognized. Defaults to `['to', 'or']`.
304+
305+
```js
306+
// German range separators
307+
parseIngredient('1 bis 2 cups flour', {
308+
rangeSeparators: ['to', 'or', 'bis', 'oder'],
309+
});
310+
// [{ quantity: 1, quantity2: 2, ... }]
311+
312+
// French range separator
313+
parseIngredient('2 à 3 cups sugar', {
314+
rangeSeparators: ['to', 'or', 'à', 'ou'],
315+
});
316+
```
317+
318+
### `descriptionStripPrefixes`
319+
320+
Words or patterns to strip from the beginning of ingredient descriptions. Commonly used to remove "of" from phrases like "1 cup of sugar". Strings are matched as whole words followed by whitespace. RegExp patterns are used as-is, which is useful for languages with contractions or elisions. Defaults to `['of']`.
321+
322+
> **Note:** This option is only applied when `allowLeadingOf` is `false` (the default). If `allowLeadingOf` is `true`, prefix stripping is disabled entirely and this option is ignored.
323+
324+
```js
325+
// Spanish "de" stripping
326+
parseIngredient('2 tazas de azúcar', {
327+
descriptionStripPrefixes: ['of', 'de'],
328+
});
329+
// [{ description: 'azúcar', ... }]
330+
331+
// French with regex patterns for elisions/contractions
332+
parseIngredient("2 tasses d'huile", {
333+
descriptionStripPrefixes: [/de\s+la\s+/iu, /de\s+l'/iu, /d'/iu, 'de'],
334+
});
335+
// [{ description: 'huile', ... }]
336+
```
337+
338+
### `trailingQuantityContext`
339+
340+
Words that indicate a trailing quantity extraction context, used to identify patterns like "Juice of 3 lemons". Defaults to `['from', 'of']`.
341+
342+
```js
343+
// German context word
344+
parseIngredient('Saft von 3 Zitronen', {
345+
trailingQuantityContext: ['from', 'of', 'von'],
346+
});
347+
// [{ quantity: 3, description: 'Saft von Zitronen', ... }]
348+
```
349+
350+
### Full i18n Example (German)
351+
352+
```js
353+
parseIngredient(
354+
`Für den Kuchen:
355+
2 bis 3 Tassen Mehl
356+
1 Tasse Zucker`,
357+
{
358+
groupHeaderPatterns: ['For', 'Für'],
359+
rangeSeparators: ['to', 'or', 'bis', 'oder'],
360+
decimalSeparator: ',',
361+
additionalUOMs: {
362+
tasse: {
363+
short: 'T',
364+
plural: 'Tassen',
365+
alternates: ['Tasse'],
366+
},
367+
},
368+
}
369+
);
370+
// [
371+
// { description: 'Für den Kuchen:', isGroupHeader: true, ... },
372+
// { quantity: 2, quantity2: 3, unitOfMeasure: 'Tassen', description: 'Mehl', ... },
373+
// { quantity: 1, unitOfMeasure: 'Tasse', description: 'Zucker', ... }
374+
// ]
375+
```
376+
244377
## Unit Conversion
245378

246379
### `convertUnit`

src/constants.ts

Lines changed: 136 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,84 @@
11
import { numericRegex } from 'numeric-quantity';
22
import { ParseIngredientOptions, UnitOfMeasureDefinitions } from './types';
33

4+
// --- i18n Utilities ---
5+
6+
/**
7+
* Escapes special regex characters in a string.
8+
*/
9+
export const escapeRegex = (str: string): string => str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
10+
11+
/**
12+
* Builds a regex that matches any of the given patterns at the start of a string,
13+
* followed by whitespace. Strings are escaped and treated as literal prefixes.
14+
* RegExp patterns have their source extracted and combined.
15+
*/
16+
export const buildPrefixPatternRegex = (patterns: (string | RegExp)[]): RegExp => {
17+
const parts = patterns.map(p =>
18+
p instanceof RegExp ? `(?:${p.source})` : `(?:${escapeRegex(p)})\\s`
19+
);
20+
return new RegExp(`^(?:${parts.join('|')})`, 'iu');
21+
};
22+
23+
/**
24+
* Builds a regex source string for range separators (dashes and word separators).
25+
* Always includes dash characters (-, –, —), plus any custom word separators.
26+
*/
27+
export const buildRangeSeparatorSource = (words: (string | RegExp)[]): string => {
28+
const wordParts = words.map(w =>
29+
w instanceof RegExp ? `(?:${w.source})` : `(?:${escapeRegex(w)})`
30+
);
31+
// Always include dashes, then word separators followed by whitespace
32+
return `(-|–|—|(?:${wordParts.join('|')})\\s)`;
33+
};
34+
35+
/**
36+
* Builds a regex that matches range separators at the start of a string.
37+
*/
38+
export const buildRangeSeparatorRegex = (words: (string | RegExp)[]): RegExp =>
39+
new RegExp(`^${buildRangeSeparatorSource(words)}`, 'iu');
40+
41+
/**
42+
* Builds a regex that matches any of the given words or patterns at the start of a string.
43+
* Strings are matched as whole words followed by whitespace.
44+
* RegExp patterns are used as-is for more complex matching (e.g., French elisions).
45+
*/
46+
export const buildStripPrefixRegex = (patterns: (string | RegExp)[]): RegExp => {
47+
const parts = patterns.map(p =>
48+
p instanceof RegExp ? `(?:${p.source})` : `(?:${escapeRegex(p)})\\s+`
49+
);
50+
return new RegExp(`^(?:${parts.join('|')})`, 'iu');
51+
};
52+
53+
/**
54+
* Builds a regex that matches any of the given words at the end of a string,
55+
* preceded by whitespace. Used for trailing quantity context like "from" or "of".
56+
*/
57+
export const buildTrailingContextRegex = (words: string[]): RegExp =>
58+
new RegExp(`\\s+(?:${words.map(escapeRegex).join('|')})$`, 'iu');
59+
60+
// --- Default i18n Values ---
61+
62+
/**
63+
* Default group header prefixes (e.g., "For the icing:").
64+
*/
65+
export const defaultGroupHeaderPatterns = ['For'] as const;
66+
67+
/**
68+
* Default range separator words (e.g., "1 to 2", "1 or 2").
69+
*/
70+
export const defaultRangeSeparators = ['or', 'to'] as const;
71+
72+
/**
73+
* Default words to strip from the beginning of descriptions.
74+
*/
75+
export const defaultDescriptionStripPrefixes = ['of'] as const;
76+
77+
/**
78+
* Default words that indicate trailing quantity context.
79+
*/
80+
export const defaultTrailingQuantityContext = ['from', 'of'] as const;
81+
482
/**
583
* Default options for {@link parseIngredient}.
684
*/
@@ -10,26 +88,42 @@ export const defaultOptions: Required<ParseIngredientOptions> = {
1088
normalizeUOM: false,
1189
ignoreUOMs: [],
1290
decimalSeparator: '.',
91+
groupHeaderPatterns: defaultGroupHeaderPatterns as unknown as string[],
92+
rangeSeparators: defaultRangeSeparators as unknown as string[],
93+
descriptionStripPrefixes: defaultDescriptionStripPrefixes as unknown as (string | RegExp)[],
94+
trailingQuantityContext: defaultTrailingQuantityContext as unknown as string[],
95+
includeMeta: false,
1396
} as const;
1497

98+
// --- Legacy Exports (for backward compatibility) ---
99+
15100
/**
16-
* List of "for" equivalents (for upcoming i18n support).
101+
* List of "for" equivalents.
102+
* @deprecated Use `defaultGroupHeaderPatterns` instead.
17103
*/
18-
export const fors = ['For'] as const;
104+
export const fors: typeof defaultGroupHeaderPatterns = defaultGroupHeaderPatterns;
105+
19106
/**
20-
* Regex to capture "for" equivalents (for upcoming i18n support).
107+
* Regex to capture "for" equivalents.
108+
* @deprecated Build dynamically using `buildPrefixPatternRegex(options.groupHeaderPatterns)`.
21109
*/
22-
export const forsRegEx: RegExp = new RegExp(`^(?:${fors.join('|')})\\s`, 'iu');
110+
export const forsRegEx: RegExp = buildPrefixPatternRegex(
111+
defaultGroupHeaderPatterns as unknown as string[]
112+
);
23113

24114
/**
25-
* List of range separators (for upcoming i18n support).
115+
* List of range separators.
116+
* @deprecated Use `defaultRangeSeparators` instead.
26117
*/
27-
export const rangeSeparatorWords = ['or', 'to'] as const;
28-
const rangeSeparatorRegExSource = `(-|–|—|(?:${rangeSeparatorWords.join('|')})\\s)`;
118+
export const rangeSeparatorWords: typeof defaultRangeSeparators = defaultRangeSeparators;
119+
29120
/**
30-
* Regex to capture range separators (for upcoming i18n support).
121+
* Regex to capture range separators.
122+
* @deprecated Build dynamically using `buildRangeSeparatorRegex(options.rangeSeparators)`.
31123
*/
32-
export const rangeSeparatorRegEx: RegExp = new RegExp(`^${rangeSeparatorRegExSource}`, 'iu');
124+
export const rangeSeparatorRegEx: RegExp = buildRangeSeparatorRegex(
125+
defaultRangeSeparators as unknown as string[]
126+
);
33127

34128
/**
35129
* Regex to capture the first word of a description, to see if it's a unit of measure.
@@ -39,31 +133,53 @@ export const firstWordRegEx: RegExp =
39133

40134
const numericRegexAnywhere = numericRegex.source.replace('^', '').replace(/\$$/, '');
41135

136+
/**
137+
* Builds a regex to capture trailing quantity and unit of measure,
138+
* using the provided range separator words.
139+
*/
140+
export const buildTrailingQuantityRegex = (rangeSeparators: (string | RegExp)[]): RegExp => {
141+
const rangeSeparatorSource = buildRangeSeparatorSource(rangeSeparators);
142+
return new RegExp(
143+
`(,|:|-|–|—|x|⨯)?\\s*((${numericRegexAnywhere})\\s*(${rangeSeparatorSource}))?\\s*(${numericRegexAnywhere})\\s*(fl(?:uid)?(?:\\s+|-)(?:oz|ounces?)|[\\p{L}\\p{N}_]+)?$`,
144+
'iu'
145+
);
146+
};
147+
42148
/**
43149
* Regex to capture trailing quantity and unit of measure.
150+
* @deprecated Build dynamically using `buildTrailingQuantityRegex(options.rangeSeparators)`.
44151
*/
45-
export const trailingQuantityRegEx: RegExp = new RegExp(
46-
`(,|:|-|–|—|x|⨯)?\\s*((${numericRegexAnywhere})\\s*(${rangeSeparatorRegExSource}))?\\s*(${numericRegexAnywhere})\\s*(fl(?:uid)?(?:\\s+|-)(?:oz|ounces?)|[\\p{L}\\p{N}_]+)?$`,
47-
'iu'
152+
export const trailingQuantityRegEx: RegExp = buildTrailingQuantityRegex(
153+
defaultRangeSeparators as unknown as string[]
48154
);
49155

50156
/**
51-
* List of "of" equivalents (for upcoming i18n support).
157+
* List of "of" equivalents.
158+
* @deprecated Use `defaultDescriptionStripPrefixes` instead.
52159
*/
53-
export const ofs = ['of'] as const;
160+
export const ofs: typeof defaultDescriptionStripPrefixes = defaultDescriptionStripPrefixes;
161+
54162
/**
55-
* Regex to capture "of" equivalents at the beginning of a string (for upcoming i18n support).
163+
* Regex to capture "of" equivalents at the beginning of a string.
164+
* @deprecated Build dynamically using `buildStripPrefixRegex(options.descriptionStripPrefixes)`.
56165
*/
57-
export const ofRegEx: RegExp = new RegExp(`^(?:${ofs.join('|')})\\s+`, 'iu');
166+
export const ofRegEx: RegExp = buildStripPrefixRegex(
167+
defaultDescriptionStripPrefixes as unknown as string[]
168+
);
58169

59170
/**
60-
* List of "from" equivalents (for upcoming i18n support).
171+
* List of "from" equivalents.
172+
* @deprecated Use `defaultTrailingQuantityContext` instead.
61173
*/
62-
export const froms = ['from', 'of'] as const;
174+
export const froms: typeof defaultTrailingQuantityContext = defaultTrailingQuantityContext;
175+
63176
/**
64-
* Regex to capture "from" equivalents at the end of a string (for upcoming i18n support).
177+
* Regex to capture "from" equivalents at the end of a string.
178+
* @deprecated Build dynamically using `buildTrailingContextRegex(options.trailingQuantityContext)`.
65179
*/
66-
export const fromRegEx: RegExp = new RegExp(`\\s+(?:${froms.join('|')})$`, 'iu');
180+
export const fromRegEx: RegExp = buildTrailingContextRegex(
181+
defaultTrailingQuantityContext as unknown as string[]
182+
);
67183

68184
/**
69185
* Default unit of measure specifications.

0 commit comments

Comments
 (0)