Add Korean TN support for cardinal numbers and postprocessing#285
Add Korean TN support for cardinal numbers and postprocessing#285mgrafu merged 6 commits intoNVIDIA:ko_tn_staging_v1from
Conversation
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
for more information, see https://pre-commit.ci
nemo_text_processing/text_normalization/ko/taggers/tokenize_and_classify.py
Fixed
Show fixed
Hide fixed
| digit_except_one = pynini.difference(NEMO_DIGIT, "1") | ||
| digit_except_zero_one = pynini.difference(digit_except_one, "0") | ||
|
|
||
| graph_digit_alt = digit_except_zero_one @ graph_digit |
There was a problem hiding this comment.
let's call this something like graph_digit_no_zero_one to make it more explicit in its 'alt'-ness
There was a problem hiding this comment.
Renamed graph_digit_alt to graph_digit_no_zero_one for clarity as suggested.
|
|
||
| graph_digit_alt = digit_except_zero_one @ graph_digit | ||
| graph_ty = pynini.string_file(get_abs_path("data/number/ty.tsv")) | ||
| graph_teen = pynini.string_file(get_abs_path("data/number/teen.tsv")) |
There was a problem hiding this comment.
are the rules for creating teens any different from the ones to create ties? does it make sense to have a separate file for them, or would it be possible to just add 1 십 to the ties file?
There was a problem hiding this comment.
You're right! Since both follow the pattern of combining a base digit with “십”, it makes sense to consolidate them. I’ll go ahead and remove the teen.tsv file and just add 1 십 to the ty.tsv file.
| graph_teen = pynini.string_file(get_abs_path("data/number/teen.tsv")) | ||
|
|
||
| # Compose all basic number forms | ||
| graph_all = (graph_ty + (graph_digit | pynutil.delete('0'))) | graph_teen | graph_digit |
There was a problem hiding this comment.
this covers numbers from 1 to 99, right? can we rename the variable for more clarity?
There was a problem hiding this comment.
I’ll rename it to graph_1_to_99 for clarity.
| graph_hundred_thousand = hundred_thousands @ graph_hundred_thousand_component | ||
|
|
||
| millions = NEMO_DIGIT**7 | ||
| graph_million_component = ((NEMO_DIGIT**3 @ graph_hundred_component) + pynutil.insert('만')) + pynini.union( |
There was a problem hiding this comment.
(NEMO_DIGIT**3 @ graph_hundred_component) == graph_hundred, let's try to use the same variable intead of redefining the rule (this applies to other _component graphs too)
There was a problem hiding this comment.
Updated the code to use graph_hundred and graph_thousand instead of (NEMO_DIGIT3 @ graph_hundred_component) and (NEMO_DIGIT4 @ graph_thousand_component).
|
|
||
| # FST | ||
| graph_num = pynini.union( | ||
| graph_thousand_trillions, |
There was a problem hiding this comment.
let's add a test case in test_cases_cardinal.txt for each of these rules
There was a problem hiding this comment.
Sure! I’ll add one test case for each of these rules in test_cases_cardinal.txt
| ).optimize() | ||
|
|
||
| # Sign and final formatting | ||
| optional_sign = pynini.closure(pynutil.insert('negative: "true" ') + pynini.cross("-", ""), 0, 1) |
There was a problem hiding this comment.
let's also add a few test cases in test_cases_cardinal.txt for negative numbers
There was a problem hiding this comment.
I'll add some negative cases as well.
| runtest $input | ||
| } | ||
|
|
||
| #testTNSpecialText() { |
There was a problem hiding this comment.
let's remove the commented out tests for now and add them as we develop each class
There was a problem hiding this comment.
Deleted all the commented out tests
…feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
|
LGTM -- let's just get CI running properly |
Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
…#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
…#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean TN support for cardinal numbers and postprocessing (#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean TN support for cardinal numbers and postprocessing (NVIDIA#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (NVIDIA#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (NVIDIA#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (NVIDIA#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (NVIDIA#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (NVIDIA#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
…#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
* Add Korean TN support for cardinal numbers and postprocessing (NVIDIA#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (NVIDIA#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (NVIDIA#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (NVIDIA#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (NVIDIA#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (NVIDIA#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
…#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean TN support for cardinal numbers and postprocessing (NVIDIA#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (NVIDIA#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (NVIDIA#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (NVIDIA#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (NVIDIA#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (NVIDIA#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean TN support for cardinal numbers and postprocessing (NVIDIA#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (NVIDIA#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (NVIDIA#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (NVIDIA#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (NVIDIA#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (NVIDIA#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
* Add Korean TN support for cardinal numbers and postprocessing (#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date (#374) * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * KO TN fixes (DCO remediation + MRC updates) (#389) * Add Korean TN support for cardinal numbers and postprocessing (#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean Ordinal TN support (#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN Decimal Support (#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Date and Time (#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Money and Telephone (#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN for Measure and Electronic (#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * chore: retrigger checks Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
…#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Jinwoo Bae <bbae7050@gmail.com>
…#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add Korean TN support for cardinal numbers and postprocessing (NVIDIA#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean Ordinal TN support (NVIDIA#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN Decimal Support (NVIDIA#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN for Date and Time (NVIDIA#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN for Money and Telephone (NVIDIA#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN for Measure and Electronic (NVIDIA#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add Korean TN support for cardinal numbers and postprocessing (#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean Ordinal TN support (#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN Decimal Support (#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN for Date and Time (#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN for Money and Telephone (#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN for Measure and Electronic (#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN fixes: cardinal, decimal, fraction, date (#374) * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * KO TN fixes (DCO remediation + MRC updates) (#389) * Add Korean TN support for cardinal numbers and postprocessing (#285) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add __init__.py to ko/data directory Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update KO_TN_CACHE to trigger Korean CI run Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean Ordinal TN support (#286) * Add Korean TN support for cardinal numbers and postprocessing Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor Korean TN cardinal and postprocessing logic based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add Korean Ordinal TN logic and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Refactor ordinal logic (1-39, 40+) and add word tagger and verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add support for 0 in ordinal tagger Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Update ordinal.py to exclude digit 1 in code and remove unnecessary TSV file Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove .far files Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/ordinal): update ordinal FST based on review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN Decimal Support (#303) * feat(ko/decimal): add Korean decimal TN support Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * feat(ko): Add fraction tagger and verbalizer with tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko): Update decimal and fraction taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN for Date and Time (#316) * feat(ko/date): Add date TN taggers, verbalizers, test cases, and post-processing fixes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/date): update date tagger and sparrowhawk test Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date TN fixes & cleanup Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Add Time tagger/verbalizer + tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko(TN): Date — strict YYYY for delimited formats; define single-year 1–4 digit behavior Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN for Money and Telephone (#324) * feat(ko/money): Korean Money TN only; add data & tests; wire tagger/verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(ko/money): polish tagger/verbalizer & expand tests Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: add Telephone TN (tagger+verbalizer) + wire + tests; include money/test updates Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: refactor money/telephone taggers & verbalizers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko/money: use NEMO_NOT_QUOTE, lowercase space helper, trim mid optimizes Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * ko: update money/telephone taggers and telephone verbalizer Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * ko: update telephone taggers Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN for Measure and Electronic (#353) * Add: Korean Measure & Electronic TN (taggers, verbalizers, tests, data) Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update KO electronic & measure taggers/verbalizers and test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Edited as per review feedback Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Korean TN fixes: cardinal, decimal, fraction, date Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add ko electronic extensions and improve electronic/telephone normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN issues and update test cases Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN electronic and post-processing issues Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix Korean TN spacing and electronic/cardinal handling Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Fix optional token separator and remove redundant whitespace normalization Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Remove unused KO post_processing and update exporter Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * Add native counting support for number+counter in Korean TN Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * chore: retrigger checks Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> --------- Signed-off-by: Jinwoo Bae <bbae7050@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
What does this PR do ?
Classify and verbalize grammars
Unit tests with coverage up to 17-digit numbers
Support for spacing around units (억, 만, 조, 경)
Post-processing logic for formatting
Unit tests and Sparrowhawk tests
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.