Commit b5fbfe1
KO TN fixes (DCO remediation + MRC updates) (NVIDIA#389)
* Add Korean TN support for cardinal numbers and postprocessing (NVIDIA#285)
* Add Korean TN support for cardinal numbers and postprocessing
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Refactor Korean TN cardinal and postprocessing logic based on review feedback
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Add __init__.py to ko/data directory
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Update KO_TN_CACHE to trigger Korean CI run
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
---------
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Korean Ordinal TN support (NVIDIA#286)
* Add Korean TN support for cardinal numbers and postprocessing
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Refactor Korean TN cardinal and postprocessing logic based on review feedback
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean Ordinal TN logic and test cases
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add support for 0 in ordinal tagger
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove .far files
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix(ko/ordinal): update ordinal FST based on review feedback
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Korean TN Decimal Support (NVIDIA#303)
* feat(ko/decimal): add Korean decimal TN support
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* feat(ko): Add fraction tagger and verbalizer with tests
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix(ko): Update decimal and fraction taggers
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Korean TN for Date and Time (NVIDIA#316)
* feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix(ko/date): update date tagger and sparrowhawk test
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ko(TN): Date TN fixes & cleanup
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ko(TN): Add Time tagger/verbalizer + tests
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Korean TN for Money and Telephone (NVIDIA#324)
* feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fix(ko/money): polish tagger/verbalizer & expand tests
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ko: refactor money/telephone taggers & verbalizers
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* ko: update money/telephone taggers and telephone verbalizer
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* ko: update telephone taggers
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
---------
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Korean TN for Measure and Electronic (NVIDIA#353)
* Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data)
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Update KO electronic & measure taggers/verbalizers and test cases
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Edited as per review feedback
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Korean TN fixes: cardinal, decimal, fraction, date
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add ko electronic extensions and improve electronic/telephone normalization
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Fix Korean TN issues and update test cases
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Fix Korean TN electronic and post-processing issues
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Fix Korean TN spacing and electronic/cardinal handling
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Fix optional token separator and remove redundant whitespace normalization
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Remove unused KO post_processing and update exporter
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add native counting support for number+counter in Korean TN
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>1 parent fb7c0db commit b5fbfe1
File tree
4 files changed
+61
-5
lines changed- nemo_text_processing/text_normalization/ko
- data/number
- taggers
- verbalizers
4 files changed
+61
-5
lines changedLines changed: 16 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
Lines changed: 9 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
Lines changed: 27 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
277 | 303 | | |
278 | 304 | | |
279 | 305 | | |
| |||
293 | 319 | | |
294 | 320 | | |
295 | 321 | | |
296 | | - | |
| 322 | + | |
297 | 323 | | |
298 | 324 | | |
299 | 325 | | |
| |||
Lines changed: 9 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
| 41 | + | |
| 42 | + | |
42 | 43 | | |
43 | | - | |
44 | | - | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
45 | 51 | | |
46 | | - | |
47 | 52 | | |
48 | 53 | | |
0 commit comments