Skip to content

Commit f2664c1

Browse files
BuyuanCuimgrafuanand-nvekmbpre-commit-ci[bot]
authored
Zh tn bug 240712 (#187)
* IT TN improvement on tests (#120) * add missing test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix bug with time tests Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * update ci date Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add sentence test cases Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * refine shortest path for irregular cardinals Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * update ci date Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add single letter exception for roman numerals (#121) * add single letter exception for roman numerals Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * update ci dir Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix broken path for nondet whitelist (#124) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Increase weights for serial (en TN) (#128) * Increase weights for serial (en TN) Resolves https://github.com/NVIDIA/NeMo-text-processing/issues/126 Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Add tests for fix Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Update Jenkinsfile cache path Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Update Jenkinsfile. Fix cache folder Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measures file for FR TN (#131) * add measures file Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * update whitelist data Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * add fr tn tests Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Sh jenkins (#127) * Add SH tests to Jenkins Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkins tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add CI/CD tests for sparrowhawk Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * docker build only if in test mode Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix missing variable Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix comments and remove arguments not required Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix commands not executing Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Missing arguments Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Missing quotes Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix incorrect path for tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Incorrect paths of tests and shunit2 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issues with paths as arguments to shunit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Undo path change Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix intentional fail test Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * revert redundant check for cased option Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix default path in export_grammars.sh Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache paths Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add interactive option Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add SH tests for cased EN ITN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update isort - fix precommit (#138) * update isort version Signed-off-by: Evelina <ebakhturina@nvidia.com> * update isort version Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix format Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused imports Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Armenian itn (#136) * Added Armenian ITN Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * Added Armenian ITN Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * Added Armenian ITN Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * Added context for tests and fixed CodeQL errors Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * Revert "Added context for tests and fixed CodeQL errors" This reverts commit 2c804d941963c0be21d3aad07e6cd13568ab747b. Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * Added context to some test files and fixed CodeQL errors Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * deleted unnecessary data Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * translated a few measurements to Armenian Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * adjusted some things for better readability and maintainer support Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed one test case and some issues Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix CI (#142) * fix whitelist deployment Signed-off-by: Evelina <ebakhturina@nvidia.com> * clean up Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * comment out tests to recreate grammars Signed-off-by: Evelina <ebakhturina@nvidia.com> * shorten test Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix jenkins Signed-off-by: Evelina <ebakhturina@nvidia.com> * cased for TN Signed-off-by: Evelina <ebakhturina@nvidia.com> * revert debug changes Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix args default Signed-off-by: Evelina <ebakhturina@nvidia.com> * try parallel Signed-off-by: Evelina <ebakhturina@nvidia.com> * debug parallel Signed-off-by: Evelina <ebakhturina@nvidia.com> * rerun Signed-off-by: Evelina <ebakhturina@nvidia.com> * rerun Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix sh tests for local SH launcher Signed-off-by: Evelina <ebakhturina@nvidia.com> * enable all ci tests Signed-off-by: Evelina <ebakhturina@nvidia.com> * enable all ci tests Signed-off-by: Evelina <ebakhturina@nvidia.com> --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Armenian TN (#137) * merged with main branch and fixed conflicts Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * fixing conflicts Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * fixing some more conflicts Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * fixed a minor issue Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * deleted unused imports Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix: add "hy" language option for armenian Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com> * added optional space for measurements after cardinals/decimals Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * added Armenian dot Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: David Sargsyan <d.sargsyan@ispras.ru> Signed-off-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com> Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Co-authored-by: David Sargsyan <d.sargsyan@ispras.ru> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ara Yeroyan <60027241+Ara-Yeroyan@users.noreply.github.com> Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Marathi ITN (#134) * Added Marathi ITN Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding jenkins test Signed-off-by: Travis Bartley <tbartley@nvidia.com> --------- Signed-off-by: Chinmay Patil <chinmaypatil2000@gmail.com> Signed-off-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Signed-off-by: Travis Bartley <tbartley@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: tbartley94 <90423858+tbartley94@users.noreply.github.com> Co-authored-by: Travis Bartley <tbartley@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * jenkins fix (#150) * jenkins fix Signed-off-by: Travis Bartley <tbartley@nvidia.com> * removing armenian to troubleshoot jenkins Signed-off-by: Travis Bartley <tbartley@nvidia.com> * removing armenian to troubleshoot jenkins Signed-off-by: Travis Bartley <tbartley@nvidia.com> * missing _init_ for python Signed-off-by: Travis Bartley <tbartley@nvidia.com> * mislabled cache Signed-off-by: Travis Bartley <tbartley@nvidia.com> --------- Signed-off-by: Travis Bartley <tbartley@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * r0.3.0 release (#151) Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Fix text=line[text] to text=line[text_field] (#153) Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * use real string on docstring (#157) Signed-off-by: Kevin Sanders <kevin.sanders@dialpad.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Sh postprocess (#147) * Add support for postprocessor far in sparrowhawk Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Choose between having a post processor or not Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * update run_evaluate script for cased itn (#164) * update run_evaluate script for cased itn Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * remove unused function from ar tn decimals (#165) * remove unused function from ar tn decimals Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * update ci date Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * ZH sentence-level TN (#112) * Swedish telephone fix (#60) * port fix for telephone from swedish-itn branch Signed-off-by: Jim O'Regan <joregan@kth.se> * extend cardinal in non-deterministic mode Signed-off-by: Jim O'Regan <joregan@kth.se> * whitespace fixes Signed-off-by: Jim O'Regan <joregan@kth.se> * also fix in the verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * Update Jenkinsfile Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * log instead of print in graph_utils.py (#68) Signed-off-by: Enno Hermann <enno.hermann@idiap.ch> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * CER estimation speedup for audio-based text normalization (#73) * Replaced jiwer with editdistance to speed up CER estimation Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Vitaly Lavrukhin <vlavrukhin@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add measure coverage for TN and ITN (#62) * add measure coverage for TN and ITN Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * Remove unused imports Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update measure.py Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> --------- Signed-off-by: ealbasiri <ealbasiri@gradcenter.cuny.edu> Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * upload es-ES, es-LA, fr-FR and it-IT g2p dicts (#63) * upload es-ES and fr-FR g2p dicts Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add inits Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add NALA Spanish dict Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * rename Spanish and French dictionaries Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add Italian dictionary Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * add country codes from hu (#77) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix electronic case for username (#75) * fix electronic username w/o . Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disable sv tests Signed-off-by: Evelina <ebakhturina@nvidia.com> * disable sv tests Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix ar test Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * disable sv tests Signed-off-by: Evelina <ebakhturina@nvidia.com> * update ci dirs, enable sv tests Signed-off-by: Evelina <ebakhturina@nvidia.com> --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * 0.1.8 release (#79) Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Codeswitched ES/EN ITN (#78) * Initial commit for ES-EN codeswitched ITN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Enable export for es_en codeswitched ITN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add whitelist, update weights Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests for en_es, zone tagged separately in es Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix path to test data for sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update Jenkinsfile - enable ES/EN tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add __init__.py files Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix issues with failed docker build - due to archiving of debian and issues with re2 Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Remove unused imports and variables Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update date Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Enable NBSP in sparrowhawk tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update copyrights Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache path in for ES/EN CI/CD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * electronic verbalizer fallback (#81) * 0.1.8 release Signed-off-by: Evelina <ebakhturina@nvidia.com> * add elec fallback Signed-off-by: Evelina <ebakhturina@nvidia.com> * update ci Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * minor normalize.py edit for usability (#84) * electronic verbalizer fallback (#81) * 0.1.8 release Signed-off-by: Evelina <ebakhturina@nvidia.com> * add elec fallback Signed-off-by: Evelina <ebakhturina@nvidia.com> * update ci Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com> * documentation edits for grammar/clarity Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com> * added --output_field flag for command line interface Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com> Co-authored-by: Evelina <10428420+ekmb@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Linnea Pari Leaver <lleaver@lleaver-mlt.client.nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Swedish ITN (#40) * force two digits for month Signed-off-by: Jim O'Regan <joregan@kth.se> * put it in a function, because I reject the garbage pre-commit.ci came up with Signed-off-by: Jim O'Regan <joregan@kth.se> * wrap some more pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * add graph pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * delete junk Signed-off-by: Jim O'Regan <joregan@kth.se> * my copyright Signed-off-by: Jim O'Regan <joregan@kth.se> * add date verbaliser (copy from es) Signed-off-by: Jim O'Regan <joregan@kth.se> * tweaks Signed-off-by: Jim O'Regan <joregan@kth.se> * add date verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * add right tokens Signed-off-by: Jim O'Regan <joregan@kth.se> * some tweaks, more needed Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * tweaks to TN date tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * tweaks to ITN date tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * tweaks to TN date tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * remove duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * moved to tagger Signed-off-by: Jim O'Regan <joregan@kth.se> * nothing actually fixed here Signed-off-by: Jim O'Regan <joregan@kth.se> * now most tests pass Signed-off-by: Jim O'Regan <joregan@kth.se> * electronic Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fractions Signed-off-by: Jim O'Regan <joregan@kth.se> * extend Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * bare fractions is a bit of an overreach Signed-off-by: Jim O'Regan <joregan@kth.se> * whitelist Signed-off-by: Jim O'Regan <joregan@kth.se> * just inverting the TN whitelist tagger will not work/be useful Signed-off-by: Jim O'Regan <joregan@kth.se> * copy from English Signed-off-by: Jim O'Regan <joregan@kth.se> * overwrite with version from en Signed-off-by: Jim O'Regan <joregan@kth.se> * add basic test case Signed-off-by: Jim O'Regan <joregan@kth.se> * fix call Signed-off-by: Jim O'Regan <joregan@kth.se> * swap tsv sides Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * add optional_era variable Signed-off-by: Jim O'Regan <joregan@kth.se> * add test case Signed-off-by: Jim O'Regan <joregan@kth.se> * make deterministic default, like most of the others Signed-off-by: Jim O'Regan <joregan@kth.se> * also add lowercase versions Signed-off-by: Jim O'Regan <joregan@kth.se> * replacing NEMO_SPACE does not work either Signed-off-by: Jim O'Regan <joregan@kth.se> * increasing weight... did not work last time Signed-off-by: Jim O'Regan <joregan@kth.se> * tweaking test cases, in case it was a sentence splitting issue. It was not Signed-off-by: Jim O'Regan <joregan@kth.se> * put the full stops back Signed-off-by: Jim O'Regan <joregan@kth.se> * add filler words Signed-off-by: Jim O'Regan <joregan@kth.se> * try splitting this out to see if it makes a difference Signed-off-by: Jim O'Regan <joregan@kth.se> * aha, this part should be non-deterministic only Signed-off-by: Jim O'Regan <joregan@kth.se> * single line only Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "increasing weight... did not work last time" This reverts commit 39b020b50db745dfd6b281c8cbca45a033926996. Signed-off-by: Jim O'Regan <joregan@kth.se> * disabling ITN here makes TN work again(?) Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "disabling ITN here makes TN work again(?)" This reverts commit be49d7d5c687876e51c2e9ce1cf1e01491df280f. Signed-off-by: Jim O'Regan <joregan@kth.se> * changing the variable name fixes norm tests Signed-off-by: Jim O'Regan <joregan@kth.se> * change the variable names Signed-off-by: Jim O'Regan <joregan@kth.se> * add missing test tooling Signed-off-by: Jim O'Regan <joregan@kth.se> * copy telephone fixes from hu Signed-off-by: Jim O'Regan <joregan@kth.se> * copy telephone fixes from hu Signed-off-by: Jim O'Regan <joregan@kth.se> * add a piece for area codes for ITN Signed-off-by: Jim O'Regan <joregan@kth.se> * add country codes from hu Signed-off-by: Jim O'Regan <joregan@kth.se> * extend any_read_digit for ITN Signed-off-by: Jim O'Regan <joregan@kth.se> * country/area codes for ITN Signed-off-by: Jim O'Regan <joregan@kth.se> * first attempt Signed-off-by: Jim O'Regan <joregan@kth.se> * add to t&c Signed-off-by: Jim O'Regan <joregan@kth.se> * add to t&c Signed-off-by: Jim O'Regan <joregan@kth.se> * remove country codes for the time being, makes things ambiguous Signed-off-by: Jim O'Regan <joregan@kth.se> * basic test cases Signed-off-by: Jim O'Regan <joregan@kth.se> * fix Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove trailing whitespace Signed-off-by: Jim O'Regan <joregan@kth.se> * Update __init__.py Signed-off-by: Jim O’Regan <joregan@kth.se> * fix comment Signed-off-by: Jim O'Regan <joregan@kth.se> * fix comment Signed-off-by: Jim O'Regan <joregan@kth.se> * basic transform of TN tests Signed-off-by: Jim O'Regan <joregan@kth.se> * basic transformation of TN decimal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * slight changes to date Signed-off-by: Jim O'Regan <joregan@kth.se> * tweak Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * include space Signed-off-by: Jim O'Regan <joregan@kth.se> * problem with tusen Signed-off-by: Jim O'Regan <joregan@kth.se> * problem with tusen was not that Signed-off-by: Jim O'Regan <joregan@kth.se> * add functions from hu Signed-off-by: Jim O'Regan <joregan@kth.se> * respect my own copyright xD Signed-off-by: Jim O'Regan <joregan@kth.se> * move data loading to constructor; had weirdness in this file, probably due to module-level python-suckage Signed-off-by: Jim O'Regan <joregan@kth.se> * move data loading, this has been an oddity before Signed-off-by: Jim O'Regan <joregan@kth.se> * try changing this year declaration Signed-off-by: Jim O'Regan <joregan@kth.se> * add year + era Signed-off-by: Jim O'Regan <joregan@kth.se> * eliminate more module-level data loading Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "eliminate more module-level data loading" This reverts commit 6a26e5d927817e1308e818758196924441ff7b3a. Signed-off-by: Jim O'Regan <joregan@kth.se> * expose variables Signed-off-by: Jim O'Regan <joregan@kth.se> * extra param for itn mode Signed-off-by: Jim O'Regan <joregan@kth.se> * change call Signed-off-by: Jim O'Regan <joregan@kth.se> * change comment Signed-off-by: Jim O'Regan <joregan@kth.se> * change comment Signed-off-by: Jim O'Regan <joregan@kth.se> * move data loading Signed-off-by: Jim O'Regan <joregan@kth.se> * fix parens Signed-off-by: Jim O'Regan <joregan@kth.se> * move data loading Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * adapt comments Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adapt/extend tests Signed-off-by: Jim O'Regan <joregan@kth.se> * fix dict init/change keys to something useful Signed-off-by: Jim O'Regan <joregan@kth.se> * initial stab at prefixed numbers Signed-off-by: Jim O'Regan <joregan@kth.se> * some adapting Signed-off-by: Jim O'Regan <joregan@kth.se> * insert kl. if absent Signed-off-by: Jim O'Regan <joregan@kth.se> * fix comments Signed-off-by: Jim O'Regan <joregan@kth.se> * the relative prefixed times Signed-off-by: Jim O'Regan <joregan@kth.se> * + comments Signed-off-by: Jim O'Regan <joregan@kth.se> * enable time Signed-off-by: Jim O'Regan <joregan@kth.se> * space in both directions Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix comment Signed-off-by: Jim O'Regan <joregan@kth.se> * fix hours to Signed-off-by: Jim O'Regan <joregan@kth.se> * split by before/after Signed-off-by: Jim O'Regan <joregan@kth.se> * delete, not insert Signed-off-by: Jim O'Regan <joregan@kth.se> * fix if Signed-off-by: Jim O'Regan <joregan@kth.se> * kl. 9 Signed-off-by: Jim O'Regan <joregan@kth.se> * copy from en Signed-off-by: Jim O'Regan <joregan@kth.se> * keep only get_abs_path Signed-off-by: Jim O'Regan <joregan@kth.se> * imports Signed-off-by: Jim O'Regan <joregan@kth.se> * add trimmed file Signed-off-by: Jim O'Regan <joregan@kth.se> * fix imports Signed-off-by: Jim O'Regan <joregan@kth.se> * two abs_paths... could be fun Signed-off-by: Jim O'Regan <joregan@kth.se> * minutes/seconds Signed-off-by: Jim O'Regan <joregan@kth.se> * suffix Signed-off-by: Jim O'Regan <joregan@kth.se> * delete, not insert Signed-off-by: Jim O'Regan <joregan@kth.se> * one optional Signed-off-by: Jim O'Regan <joregan@kth.se> * export variable Signed-off-by: Jim O'Regan <joregan@kth.se> * kl. or one of suffix/zone Signed-off-by: Jim O'Regan <joregan@kth.se> * already disambiguated Signed-off-by: Jim O'Regan <joregan@kth.se> * closure Signed-off-by: Jim O'Regan <joregan@kth.se> * do not insert kl. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix test case Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix spelling Signed-off-by: Jim O'Regan <joregan@kth.se> * Delete measure.py Signed-off-by: Jim O’Regan <joregan@kth.se> * Delete money.py Signed-off-by: Jim O’Regan <joregan@kth.se> * remove unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused test pieces Signed-off-by: Jim O'Regan <joregan@kth.se> * copy from es Signed-off-by: Jim O'Regan <joregan@kth.se> * add SV ITN Signed-off-by: Jim O'Regan <joregan@kth.se> * add/update __init__ Signed-off-by: Jim O'Regan <joregan@kth.se> * blank line Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix comment Signed-off-by: Jim O'Regan <joregan@kth.se> * fix lang Signed-off-by: Jim O'Regan <joregan@kth.se> * fix decimal verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix Signed-off-by: Jim O'Regan <joregan@kth.se> * remove year, conflicts with cardinal Signed-off-by: Jim O'Regan <joregan@kth.se> * space before, not after Signed-off-by: Jim O'Regan <joregan@kth.se> * fix cardinal tests Signed-off-by: Jim O'Regan <joregan@kth.se> * spurious deletion Signed-off-by: Jim O'Regan <joregan@kth.se> * fix comment Signed-off-by: Jim O'Regan <joregan@kth.se> * unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * re-enable SV TN; enable SV ITN Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "re-enable SV TN; enable SV ITN" This reverts commit 3ce4dfde1f70a89afc274284f6e4c737b3fac95b. Signed-off-by: Jim O'Regan <joregan@kth.se> * fix singulras Signed-off-by: Jim O'Regan <joregan@kth.se> * add an export Signed-off-by: Jim O'Regan <joregan@kth.se> * change integer graph Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * move spaces Signed-off-by: Jim O'Regan <joregan@kth.se> * use cdrewrite Signed-off-by: Jim O'Regan <joregan@kth.se> * just EOS/BOS Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Jim O'Regan <joregan@kth.se> * omit en/ett, because they are also articles Signed-off-by: Jim O'Regan <joregan@kth.se> * uncomment Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * unused Signed-off-by: Jim O'Regan <joregan@kth.se> * strip spaces from decimal part Signed-off-by: Jim O'Regan <joregan@kth.se> * export Signed-off-by: Jim O'Regan <joregan@kth.se> * partial fix, not what I wanted Signed-off-by: Jim O'Regan <joregan@kth.se> * move comment Signed-off-by: Jim O'Regan <joregan@kth.se> * en/ett cannot work in itn case Signed-off-by: Jim O'Regan <joregan@kth.se> * be more deliberate in graph construction Signed-off-by: Jim O'Regan <joregan@kth.se> * accept both Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * +2 tests Signed-off-by: Jim O'Regan <joregan@kth.se> * (try to) accept singular quantities for plurals Signed-off-by: Jim O'Regan <joregan@kth.se> * retry Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * oops Signed-off-by: Jim O'Regan <joregan@kth.se> * replace Signed-off-by: Jim O'Regan <joregan@kth.se> * arcmap Signed-off-by: Jim O'Regan <joregan@kth.se> * version without ones Signed-off-by: Jim O'Regan <joregan@kth.se> * add another test Signed-off-by: Jim O'Regan <joregan@kth.se> * change graph Signed-off-by: Jim O'Regan <joregan@kth.se> * simplify Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of this, this is where it goes wrong Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * add a test Signed-off-by: Jim O'Regan <joregan@kth.se> * multiple states from both ones, try removing and readding Signed-off-by: Jim O'Regan <joregan@kth.se> * remove ones, see if that fixes at least the bare quantities Signed-off-by: Jim O'Regan <joregan@kth.se> * works in the repl, dunno why it still breaks Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove duplicate Signed-off-by: Jim O'Regan <joregan@kth.se> * move definition Signed-off-by: Jim O'Regan <joregan@kth.se> * simplify Signed-off-by: Jim O'Regan <joregan@kth.se> * tweak Signed-off-by: Jim O'Regan <joregan@kth.se> * another test Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * local declaration, seems to not be working Signed-off-by: Jim O'Regan <joregan@kth.se> * more tests Signed-off-by: Jim O'Regan <joregan@kth.se> * match verbaliser Signed-off-by: Jim O'Regan <joregan@kth.se> * fix last two failing tests Signed-off-by: Jim O'Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add missing tests for telephone and word Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused variable Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused imports Signed-off-by: Jim O'Regan <joregan@kth.se> * fix comment Signed-off-by: Jim O'Regan <joregan@kth.se> * get rid of convert_space, tests fail Signed-off-by: Jim O'Regan <joregan@kth.se> * put convert_spaces back, change test file; pytest fails Signed-off-by: Jim O'Regan <joregan@kth.se> * Revert "put convert_spaces back, change test file; pytest fails" This reverts commit a7bb7489137b8026aab02aff64df39e874630043. Signed-off-by: Jim O'Regan <joregan@kth.se> * put convert_spaces back, change test file; pytest fails, take 2 Signed-off-by: Jim O'Regan <joregan@kth.se> * deliberately remove spaces rather than have a non-determinism that comes out differently in sparrowhawk Signed-off-by: Jim O'Regan <joregan@kth.se> * try converting the non-breaking spaces in the shell script Signed-off-by: Jim O'Regan <joregan@kth.se> * wrong place Signed-off-by: Jim O'Regan <joregan@kth.se> * fix typo Signed-off-by: Jim O'Regan <joregan@kth.se> * fix path Signed-off-by: Jim O'Regan <joregan@kth.se> * export Signed-off-by: Jim O'Regan <joregan@kth.se> * export Signed-off-by: Jim O'Regan <joregan@kth.se> * remove unused Signed-off-by: Jim O'Regan <joregan@kth.se> * Update date.py Signed-off-by: Jim O’Regan <joregan@kth.se> * Update time.py Signed-off-by: Jim O’Regan <joregan@kth.se> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix comment Signed-off-by: Jim O’Regan <joregan@kth.se> * trim comments Signed-off-by: Jim O’Regan <joregan@kth.se> * remove commented line Signed-off-by: Jim O’Regan <joregan@kth.se> * en halv Signed-off-by: Jim O’Regan <joregan@kth.se> * Update test_sparrowhawk_inverse_text_normalization.sh Signed-off-by: Jim O’Regan <joregan@kth.se> --------- Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Jim O’Regan <joregan@kth.se> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Italian_TN (#67) * add TN italian Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix init Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix LOCATION Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * modify graph_utils Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * correct decimals Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix electronic Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com> * fix electronic Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com> * fix measure Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com> --------- Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> Signed-off-by: Giacomo Cavallini <giacomoleonemaria@gmail.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Mariana <47233618+mgrafu@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Zh itn (#74) * Add ZH ITN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix copyrights and code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Remove invalid tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Resolve CodeQL issues Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix missing 'zh' option for ITN and correct comment Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update __init__.py Change to zh instead of en for the imports. Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update for decimal test data Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * update for langauge import Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update for Chinese punctuations Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * a new class for whitelist Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * PYNINI_AVAILABLE = False Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * recreated due to file import format issue Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * recreated due to format issue Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * caught duplicates, removed Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed duplicates, arranges for CHInese Yuan updates Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates accordingly to the comments from last PR. Recreated some of the files due to format issues Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed the hours_to and minute_to files used for back counting. ALso removed am and pm suffix files according to the last PR. Recreated some of them for format issue Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * re-added this file to avoid data file import error Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated gramamr according to last PR. Removed the acceptance of 千 Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated according to last PR. Removed comma after decimal points Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * gramamr for Fraction Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * gramamr for money and updated according to last PR. Plus process of 元 Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * ordinal grammar. updates due to the updates in cardinal grammar Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated accordingly to last PR comments. removing am and pm and allowing simple mandarin expression Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * arrangements Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * added whitelist grammar Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * word grammar for non-classified items Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated cardinal, decimal, time, itn data Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates according to last PR Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates according to the updates for cardinal grammar Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates for more Mandarin punctuations Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated accordingly to last PR. removing am pm Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * adjustment on the weight Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated accordingly to the targger updates Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated accordingly to the time tagger Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates according to changes in tagger on am and pm Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * verbalizer for fraction Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * added for mandarin grammar Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * kept this file because using English utils results in data namin error Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * merge conflict Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed unsed imports Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * deleted unsed import os Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * deleted unsed variables Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed unsed imports Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updates and edits based on pr checks Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates and edits based on pr checks Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * format issue, reccreated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * format issue recreated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixed codeing style/format Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * fixed coding style and format Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed duplicated graph for 毛 Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed the comment Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed the comment Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removing unnecessary comments Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * unnecessary comment removed Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * test file updated for more cases Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated with a comment explaining why this file is kept Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated the file explaining why this file is kept Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * added Mandarin as zh Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removing for dplication Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed unused NEMO objects Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed duplicates Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removing unsed imports Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates to fix test file failures Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates to fix file failtures Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates to resolve test case failture Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates to resolve test case failure Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates to resolve test case failure Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates to resolve test case failure Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates to adap to cardinal grammar changes Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates to adapt to grammar changes Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates to adopt to cardinal grammar changes Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix style Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * fix style Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * fix style Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * fix style Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * fixing pr checks Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed // for zhtn/itn cache Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * Update inverse_normalize.py Added zh as a selection to pass Jenkins checks. Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> Signed-off-by: BuyuanCui <alexcui1994@gmail.com> Co-authored-by: Alex Cui <alcui@nvidia.com> Co-authored-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * updated pynini_export.py file to create far files (#88) Signed-off-by: BuyuanCui <alexcui1994@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * readd Swedish (#87) Signed-off-by: Jim O'Regan <joregan@kth.se> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Zh tn 0712 (#89) * updates Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates and fixings according to document on natonal gideline Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * Decimal grammar added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * fraction updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * money updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * ordinal grammar added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * punctuation grammar added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * time gramamr updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * tokenizaer updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates on certificate Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * data updated and added due to updates and chanegs to the existing grammar Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * cardinal updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * date grammar changed Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * decimal grammar added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * grammar updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * grammar updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * grammar added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * grammar updates Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * test data added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * test python file edits Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates for tn1.0 and previous tn grammar from contribution Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * test cases updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * coding style fixed Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * dates updated for init files Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated the date for zh Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed unsed imports Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed comments Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added back the itn tests Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * added back measure and math from previou TN Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated for tests reruns Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updats Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated weights Signed-off-by: BuyuanCui <alexcui1994@gmail.com> --------- Signed-off-by: BuyuanCui <alexcui1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Zh tn char (#95) * file name change Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * file name change Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * file name change Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * file name change Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * file name change Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * file name Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * file name Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * file name Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * file name Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * file name Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * code stle Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * fixed import error Signed-off-by: BuyuanCui <alexcui1994@gmail.com> --------- Signed-off-by: BuyuanCui <alexcui1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * audio-based TN fix for empty pred_text/text (#92) * fix for empty pred_text Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add unittests Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix path Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix path Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix pytest Signed-off-by: Evelina <ebakhturina@nvidia.com> --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * pip 1.2.0 Signed-off-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * French tn (#91) * add tests for fr tn Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add fr tn for cardinals, decimals, fractions and ordinals Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * delete it far files from tools Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * add languages to run_evaluate Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * remove ambiguous spacing Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * enable sh testing for fr tn Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix bug with ordinals Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * update jenkinsfile cache date Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * fix test for ordinals Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * update tn cache for fr Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * resolve codeql issues Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Add whitelist_tech.tsv (#96) Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Zhitn 0727 (#93) * updates on itn grammar to pass sparrowhawk tests Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updats for sparrowhawk tests Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates fro sparrowhawk tests Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * coding style fix Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates for coding style and sparrowhawk test Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated classes for tests on whitelist and word grammar Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * added for tests on whitelist Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * added for test on word Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * added to run test on whitelist Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * added to run test on word Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_word.py Removed unused import. Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * Update test_word.py Removed imports according to CodeQL Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * Update test_whitelist.py Removing imports according to CodeQL Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * Update test_whitelist.py Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> * Update Jenkinsfile changed zh cache to 07-27-23 as it is the latest update. Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> --------- Signed-off-by: BuyuanCui <alexcui1994@gmail.com> Signed-off-by: Buyuan(Alex) Cui <69030297+BuyuanCui@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Es tn romans fix (#98) * fix es tn roman exceptions Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * update jenkinsfile Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * update eval script for ITN Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * codeql fix Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Change docker image (#102) Change docker image to one including sparrowhawk Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Print warning instead exception (#97) * raise text Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * text arg Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * Failed text Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * add logger Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rm raise Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * logger Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com> * NeMo-text-processing Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com> * info level Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * rm raise Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * verbose Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Normalizer.select_verbalizer Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com> * Exception Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com> * verbose Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * restart ci Signed-off-by: Evelina <ebakhturina@nvidia.com> --------- Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com> Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Nikolay Karpov <nkarpov@nvidia.com> Co-authored-by: Evelina <ebakhturina@nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * warning regardless of verbose flag (#107) * warning Signed-off-by: Nikolay Karpov <karpnv@gmail.com> * self.verbose Signed-off-by: Nikolay Karpov <karpnv@gmail.com> --------- Signed-off-by: Nikolay Karpov <karpnv@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Unpin setuptools (#106) Signed-off-by: Peter Plantinga <plantinga.peter@proton.me> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fixed warnings: File is not always closes. (#113) Signed-off-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Co-authored-by: Xuesong Yang <16880-xueyang@users.noreply.gitlab-master.nvidia.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * fix bug #111 (ar currencies) (#117) * fix bug #111 (ar currencies) Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> * update ci folder Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> --------- Signed-off-by: Mariana Graterol Fuenmayor <mgrafu@gmail.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Logging clean up + IT TN fix (#118) * fix utils and it TN Signed-off-by: Evelina <ebakhturina@nvidia.com> * clean up Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix logging Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix format Signed-off-by: Evelina <ebakhturina@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix format Signed-off-by: Evelina <ebakhturina@nvidia.com> * fix format Signed-off-by: Evelina <ebakhturina@nvidia.com> * add IT TN to CI Signed-off-by: Evelina <ebakhturina@nvidia.com> * update patch Signed-off-by: Evelina <ebakhturina@nvidia.com> --------- Signed-off-by: Evelina <ebakhturina@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Alex Cui <alexcui1994@gmail.com> * Time_IT_TN (#105) * add time verbalizer Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * add time tagger and verba Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * add pytest time Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * codeQL Signed-off-by: GiacomoLeoneMaria <giacomoleonemaria@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix numbers …
1 parent edd2f46 commit f2664c1

14 files changed

Lines changed: 85 additions & 48 deletions

File tree

Jenkinsfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -476,4 +476,4 @@ pipeline {
476476
cleanWs()
477477
}
478478
}
479-
}
479+
}
Lines changed: 32 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,32 @@
1-
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2-
#
3-
# Licensed under the Apache License, Version 2.0 (the "License");
4-
# you may not use this file except in compliance with the License.
5-
# You may obtain a copy of the License at
6-
#
7-
# http://www.apache.org/licenses/LICENSE-2.0
8-
#
9-
# Unless required by applicable law or agreed to in writing, software
10-
# distributed under the License is distributed on an "AS IS" BASIS,
11-
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12-
# See the License for the specific language governing permissions and
13-
# limitations under the License.
14-
15-
16-
import pynini
17-
from pynini.lib import pynutil
18-
19-
from nemo_text_processing.text_normalization.zh.graph_utils import NEMO_NOT_QUOTE, GraphFst, delete_space
20-
21-
22-
class WordFst(GraphFst):
23-
'''
24-
tokens { char: "一" } -> 一
25-
'''
26-
27-
def __init__(self, deterministic: bool = True, lm: bool = False):
28-
super().__init__(name="char", kind="verbalize", deterministic=deterministic)
29-
30-
graph = pynutil.delete("name: \"") + NEMO_NOT_QUOTE + pynutil.delete("\"")
31-
graph = pynini.closure(delete_space) + graph + pynini.closure(delete_space)
32-
self.fst = graph.optimize()
1+
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
16+
import pynini
17+
from pynini.lib import pynutil
18+
19+
from nemo_text_processing.text_normalization.zh.graph_utils import NEMO_NOT_QUOTE, GraphFst, delete_space
20+
21+
22+
class WordFst(GraphFst):
23+
'''
24+
tokens { char: "一" } -> 一
25+
'''
26+
27+
def __init__(self, deterministic: bool = True, lm: bool = False):
28+
super().__init__(name="char", kind="verbalize", deterministic=deterministic)
29+
30+
graph = pynutil.delete("name: \"") + NEMO_NOT_QUOTE + pynutil.delete("\"")
31+
graph = pynini.closure(delete_space) + graph + pynini.closure(delete_space)
32+
self.fst = graph.optimize()

nemo_text_processing/text_normalization/en/taggers/electronic.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True):
5252

5353
cc_cues = pynutil.add_weight(pynini.string_file(get_abs_path("data/electronic/cc_cues.tsv")), MIN_NEG_WEIGHT,)
5454

55+
cc_cues = pynutil.add_weight(pynini.string_file(get_abs_path("data/electronic/cc_cues.tsv")), MIN_NEG_WEIGHT)
56+
5557
accepted_symbols = pynini.project(pynini.string_file(get_abs_path("data/electronic/symbol.tsv")), "input")
5658
accepted_common_domains = pynini.project(
5759
pynini.string_file(get_abs_path("data/electronic/domain.tsv")), "input"
@@ -135,6 +137,18 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True):
135137
)
136138
graph |= cc_phrases
137139

140+
if deterministic:
141+
# credit card cues
142+
numbers = pynini.closure(NEMO_DIGIT, 4, 16)
143+
cc_phrases = (
144+
pynutil.insert("protocol: \"")
145+
+ cc_cues
146+
+ pynutil.insert("\" domain: \"")
147+
+ numbers
148+
+ pynutil.insert("\"")
149+
)
150+
graph |= cc_phrases
151+
138152
final_graph = self.add_tokens(graph)
139153

140154
self.fst = final_graph.optimize()

nemo_text_processing/text_normalization/zh/taggers/money.py

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@
1919
from nemo_text_processing.text_normalization.zh.graph_utils import GraphFst
2020
from nemo_text_processing.text_normalization.zh.utils import get_abs_path
2121

22-
# def get_quantity(decimal):
2322
suffix = pynini.union(
2423
"万",
2524
"十万",
@@ -107,7 +106,7 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True, lm: bool = Fa
107106
# larger money as decimals
108107
graph_decimal = (
109108
pynutil.insert('integer_part: \"')
110-
+ pynini.closure(
109+
+ (
111110
pynini.closure(cardinal, 1)
112111
+ pynutil.delete('.')
113112
+ pynutil.insert('点')
@@ -117,14 +116,16 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True, lm: bool = Fa
117116
)
118117
graph_decimal_money = (
119118
pynini.closure(graph_decimal, 1)
120-
+ pynini.closure(pynutil.insert(' quantity: \"') + suffix + pynutil.insert('\"'))
119+
+ pynini.closure((pynutil.insert(' quantity: \"') + suffix + pynutil.insert('\"')), 0, 1)
121120
+ pynutil.insert(" ")
122121
+ pynini.closure(currency_mandarin_component, 1)
123122
) | (
124123
pynini.closure(currency_component, 1)
125124
+ pynutil.insert(" ")
126125
+ pynini.closure(graph_decimal, 1)
127-
+ pynini.closure(pynutil.insert(" ") + pynutil.insert('quantity: \"') + suffix + pynutil.insert('\"'))
126+
+ pynini.closure(
127+
(pynutil.insert(" ") + pynutil.insert('quantity: \"') + suffix + pynutil.insert('\"')), 0, 1
128+
)
128129
)
129130

130131
graph = (
@@ -134,7 +135,5 @@ def __init__(self, cardinal: GraphFst, deterministic: bool = True, lm: bool = Fa
134135
| pynutil.add_weight(graph_decimal_money, -1.0)
135136
)
136137

137-
final_graph = graph
138-
139-
final_graph = self.add_tokens(final_graph)
138+
final_graph = self.add_tokens(graph)
140139
self.fst = final_graph.optimize()

tests/nemo_text_processing/en/test_sparrowhawk_inverse_text_normalization.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,4 +82,4 @@ testITNWord() {
8282
shift $#
8383

8484
# Load shUnit2
85-
. /workspace/shunit2/shunit2
85+
. /workspace/shunit2/shunit2

tests/nemo_text_processing/en/test_sparrowhawk_inverse_text_normalization_cased.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,4 +82,4 @@ testITNWord() {
8282
shift $#
8383

8484
# Load shUnit2
85-
. /workspace/shunit2/shunit2
85+
. /workspace/shunit2/shunit2

tests/nemo_text_processing/en/test_sparrowhawk_normalization.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,4 +119,4 @@ testTNMath() {
119119
shift $#
120120

121121
# Load shUnit2
122-
. /workspace/shunit2/shunit2
122+
. /workspace/shunit2/shunit2

tests/nemo_text_processing/mr/test_cardinal.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,13 @@
1616
from parameterized import parameterized
1717

1818
from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer
19+
from nemo_text_processing.text_normalization.normalize import Normalizer
1920

2021
from ..utils import CACHE_DIR, parse_test_case_file
2122

2223

23-
class TestCardinal:
24+
class TestPreprocess:
25+
2426
inverse_normalizer_mr = InverseNormalizer(lang='mr', cache_dir=CACHE_DIR, overwrite_cache=False)
2527

2628
@parameterized.expand(parse_test_case_file('mr/data_inverse_text_normalization/test_cases_cardinal.txt'))

tests/nemo_text_processing/mr/test_date.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
from parameterized import parameterized
1717

1818
from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer
19+
from nemo_text_processing.text_normalization.normalize import Normalizer
1920

2021
from ..utils import CACHE_DIR, parse_test_case_file
2122

tests/nemo_text_processing/zh/data_text_normalization/test_cases_word.txt

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,22 @@
44
只有智商超过一定数值的人才能破解~只有智商超过一定数值的人才能破解
55
这是由人工智能控制的系统~这是由人工智能控制的系统
66
欧洲旅游目的地多到不知道怎么选~欧洲旅游目的地多到不知道怎么选
7-
马斯科卖掉豪宅住进折叠屋~马斯科卖掉豪宅住进折叠屋
7+
马斯科卖掉豪宅住进折叠屋~马斯科卖掉豪宅住进折叠屋
8+
免除GOOGLE在一桩诽谤官司中的法律责任。~免除GOOGLE在一桩诽谤官司中的法律责任。
9+
这对CHROME是有利的。~这对CHROME是有利的。
10+
这可能是PILde使用者。~这可能是PILde使用者。
11+
CSI侧重科学办案,也就是现场搜正和鉴识。~CSI侧重科学办案,也就是现场搜正和鉴识。
12+
我以前非常喜欢一个软体,DRAW。~我以前非常喜欢一个软体,DRAW。
13+
我爱你病毒。~我爱你病毒。
14+
微软举办了RACETOMARKETCHALLENGE竞赛。~微软举办了RACETOMARKETCHALLENGE竞赛。
15+
苹果销售量的复苏程度远超PC市场。~苹果销售量的复苏程度远超PC市场。
16+
第三季还有两款ANDROID手机亮相。~第三季还有两款ANDROID手机亮相。
17+
反而应试著让所有GOOGLE服务更加社交化。~反而应试著让所有GOOGLE服务更加社交化。
18+
GOOGLE已提供一项NATIVECLIENT软体。~GOOGLE已提供一项NATIVECLIENT软体。
19+
这些程式都支援PRE与ITUNES同步化。~这些程式都支援PRE与ITUNES同步化。
20+
可以推断此次NTT可能也会将同样的策略用在LTE上。~可以推断此次NTT可能也会将同样的策略用在LTE上。
21+
现今许多小型企业因成本考量被迫采用一般PC作为伺服器。~现今许多小型企业因成本考量被迫采用一般PC作为伺服器。
22+
部落格宣布GOOGLECHROMES的诞生。~部落格宣布GOOGLECHROMES的诞生。
23+
由ZIP订购机场接送或观光景点共乘服务。~由ZIP订购机场接送或观光景点共乘服务。
24+
PAQUE表示短时间应该还不会全面开放。~PAQUE表示短时间应该还不会全面开放。
25+
CBS是美国一家重要的广播电视网路公司。~CBS是美国一家重要的广播电视网路公司。

0 commit comments

Comments
 (0)