Skip to content

feat: strip leading quantitiy prefixes + improved uom regex#49

Merged
jakeboone02 merged 12 commits into
jakeboone02:mainfrom
AfoxDesignz:main
Apr 20, 2026
Merged

feat: strip leading quantitiy prefixes + improved uom regex#49
jakeboone02 merged 12 commits into
jakeboone02:mainfrom
AfoxDesignz:main

Conversation

@AfoxDesignz

Copy link
Copy Markdown
Contributor
  • Add support for stripping leading quantity prefixes such as "about", "ca." and "bis zu" from ingredients.

  • Enhanced regex patterns to support additional plural markers for units of measure (e.g., slash or parenthetical forms like "Scheibe/n", "Scheibe(n)", often used on germanys most popular recipe site chefkoch.de).

  • Add tests for both features

  • Update README to document the new leadingQuantityPrefixes option.

@codesandbox-ci

Copy link
Copy Markdown

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit af2d03f:

Sandbox Source
parse-ingredient-ci Configuration

@codecov

codecov Bot commented Feb 21, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (78fe644) to head (4f0cddf).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##              main       #49    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files            5         5            
  Lines         2373      2611   +238     
==========================================
+ Hits          2373      2611   +238     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@AfoxDesignz

Copy link
Copy Markdown
Contributor Author

Hey @jakeboone02 have you had time to review the changes yet?

@jakeboone02

Copy link
Copy Markdown
Owner

Sorry for the silence lately! I'm traveling abroad for work and won't really have a chance to look at my open source projects for probably a couple more weeks.

@AfoxDesignz

Copy link
Copy Markdown
Contributor Author

Sorry for the silence lately! I'm traveling abroad for work and won't really have a chance to look at my open source projects for probably a couple more weeks.

Alright, then have fun :)

@jakeboone02

Copy link
Copy Markdown
Owner

Updated based on this initial review by Copilot:


Alignment with codebase ✅

The PR follows established patterns well:

  • New option in ParseIngredientOptions with JSDoc + @default/@example
  • New build*Regex function in constants.ts, matching existing builders
  • Default constant (defaultLeadingQuantityPrefixes = []) added like others
  • Type (string | RegExp)[] consistent with descriptionStripPrefixes
  • meta.sourceText correctly preserves the original line, not the stripped one
  • Tests and README both updated

Issues & concerns

1. Potential crash on empty q2Portion — In the range-parsing section, q2Portion[0] will be undefined if stripLeadingQuantityPrefixes returns an empty string. This existed in the original code too, but the PR makes it slightly more likely since aggressive prefixes could strip content down to nothing.

2. /$^/u "never-match" regex is unconventional — The other build* functions don't guard against empty arrays. This is actually better behavior (returning a regex that matches nothing when no patterns exist), but the inconsistency could confuse future contributors. A comment explaining the choice would help.

3. \s* vs \s+ inconsistencybuildStripPrefixRegex uses \s+ (requires whitespace after string prefixes), but buildLeadingQuantityPrefixRegex uses \s* (optional). This is intentional (to handle ca.200g), but the divergence from the sister function isn't documented.

4. Infinite loop safety is subtlestripLeadingQuantityPrefixes relies on !match[0] to break on zero-length matches. This works (empty string is falsy), but is easy to misread. A comment would help.

5. Interaction between rangeSeparators and leadingQuantityPrefixes — Input like "3 bis zu 5 EL" with rangeSeparators: ['bis'] and leadingQuantityPrefixes: ['bis zu'] would leave "zu 5 EL" after range extraction. The prefix regex won't strip "zu" alone. This is a configuration-level nuance, not a bug, but worth a note in the README.

6. Ordering sensitivity — Providing ['ca', 'ca.'] would match "ca" first in "ca. 200g", leaving ". 200g". Standard regex alternation behavior, but worth documenting that longer/more-specific patterns should come first.

Regex changes (slash/paren UOM) ✅

The firstWordRegEx and buildTrailingQuantityRegex changes ([.-][./-] + \(…\) alternative) are well-scoped. Adding / to the word-separator class won't affect quantity fraction parsing since these regexes operate on the description portion after quantities are extracted.

Verdict

Solid contribution that aligns with the codebase's i18n strategy. The main actionable items are: add a guard for empty q2Portion, and add a couple of inline comments for the non-obvious /$^/ and zero-length-match safety patterns. Everything else is minor documentation polish.

@jakeboone02 jakeboone02 merged commit 23f70d4 into jakeboone02:main Apr 20, 2026
3 checks passed
@jakeboone02

Copy link
Copy Markdown
Owner

Thanks for this contribution! It's included in v2.2.0.

@AfoxDesignz

Copy link
Copy Markdown
Contributor Author

Thanks for this contribution! It's included in v2.2.0.

Thanks for the Review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants