Skip to content

Added Hindi percentage ITN class#1

Open
mayuris-00 wants to merge 4 commits intoRajanPutty:ITN-KT_Percentage-Classfrom
mayuris-00:ITN-KT_Percentage-Class
Open

Added Hindi percentage ITN class#1
mayuris-00 wants to merge 4 commits intoRajanPutty:ITN-KT_Percentage-Classfrom
mayuris-00:ITN-KT_Percentage-Class

Conversation

@mayuris-00
Copy link
Copy Markdown

@mayuris-00 mayuris-00 commented Apr 9, 2026

Summary

Added a new percentage semiotic class to the Hindi ITN pipeline.
The system now converts spoken Hindi percentages to written form:

  • बीस प्रतिशत → २०%
  • सत्तर परसेंट → ७०%
  • पाँच सौ फ़ीसदी → ५००%

Files Added

  • hi/data/percentage/percent_symbol.tsv
  • hi/taggers/percentage.py
  • hi/verbalizers/percentage.py
  • tests/nemo_text_processing/hi/test_percentage.py
  • tests/.../test_cases_percentage.txt

Files Modified

  • hi/taggers/tokenize_and_classify.py
  • hi/verbalizers/verbalize.py

Test Results

All 12 percentage test cases passed.
All existing Hindi ITN tests still pass.

Verbose Trace

Input: बीस प्रतिशत
Tagged: tokens { percentage { integer: "२०" percent: "%" } }
Output: २०%

Input: सत्तर परसेंट
Tagged: tokens { percentage { integer: "७०" percent: "%" } }
Output: ७०%

@mayuris-00 mayuris-00 closed this Apr 9, 2026
@mayuris-00 mayuris-00 reopened this Apr 9, 2026
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should not be here. You already have the correct copy at tests/nemo_text_processing/hi/test_percentage.py. This root-level version will not work -- the relative import from ..utils import CACHE_DIR requires the file to be inside the tests/nemo_text_processing/hi/ package. Please delete this file.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cdifflib fallback fix is unrelated to the percentage class. Even if it was needed to get your environment working, it should be a separate commit or a separate PR. Mixing unrelated fixes into a feature PR makes the review harder and the git history messier. Please remove this change from this PR and raise it separately if needed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the cdifflib change from this PR. Will raise it separately if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants