Conversation
nemo_text_processing/inverse_text_normalization/ko/graph_utils.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/inverse_text_normalization/ko/taggers/__init__.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/inverse_text_normalization/ko/verbalizers/__init__.py
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,11 @@ | |||
| 억 | |||
There was a problem hiding this comment.
based on how you implemented the rules, do we need this file?
There was a problem hiding this comment.
I kept this file for tracking all the units for Korean cardinal, so yes I can delete it. But, for my cardinal taggers, I only expressed the number until the digit of ten quintillion(천경). I was wondering if I should go higher than that. If yes, I can write the code quickly, since it just all loops.
| ### Responsible for the first digit of number. ex) 1,2,3,4,5,,, | ||
| graph_ten_component += graph_digit | pynutil.insert("0") | ||
|
|
||
| hundred = pynutil.delete("백") |
There was a problem hiding this comment.
let's have tests for each and every one of these blocks, including "delete" and "cross" the 백
There was a problem hiding this comment.
Tested by commenting out each lines and I do need both lines to have the test result that I wanted. "delete" line is specifically for hundredth digit that is not "1". (For ex. 200, 300, 400) "cross" line is specifically for if hundredth digit is 1. (100). Let me know if you meant something else by "testing" it.
| leading_zero = ( | ||
| pynutil.delete(pynini.closure("0")) + pynini.difference(NEMO_DIGIT, "0") + pynini.closure(NEMO_DIGIT) | ||
| ) | ||
| graph_nonzero = graph @ leading_zero |
| graph_nonzero = graph @ leading_zero | ||
| graph = pynini.union(graph_nonzero, graph_zero) | ||
|
|
||
| graph = graph @ leading_zero | graph_zero |
There was a problem hiding this comment.
is graph @ leading_zero replaceable with graph_nonzero?
|
|
||
| final_graph = ( | ||
| optional_sign + pynutil.insert(" ") + pynutil.insert("integer: \"") + graph + pynutil.insert("\"") | ||
| ) | (pynutil.insert("integer: \"") + graph + pynutil.insert("\"")) |
There was a problem hiding this comment.
if the sign is already optional, do we need this line? or is it enough to also make the first space part of the optional sign as well?
There was a problem hiding this comment.
I believe I still need it for showing result on terminal. Optional sign could be replaced with graph_negative if I redefine properly...?
Realized I am not using graph_negative as I am replying to this comment. Deleted graph_negative and one more line related to it(line 39, 40)
| pynutil.delete("negative:") | ||
| + delete_space | ||
| + pynutil.delete("\"") | ||
| + pynini.accep("-") |
There was a problem hiding this comment.
you could also make it "negative": "true" in the tagger and do pynini.cross("\"true\"") here -- check how English does it
There was a problem hiding this comment.
I actually don't see it on ITN part of English, but if you are referring to TN part of English, I do see it. Do you want me to change it in that format?
| def __init__(self): | ||
| super().__init__(name="cardinal", kind="classify") | ||
|
|
||
| graph_zero = pynini.string_file(get_abs_path("data/numbers/zero.tsv")) |
Check warning
Code scanning / CodeQL
Variable defined multiple times
| from nemo_text_processing.inverse_text_normalization.ko.graph_utils import ( | ||
| INPUT_LOWER_CASED, | ||
| GraphFst, | ||
| delete_extra_space, | ||
| delete_space, | ||
| generator_main, | ||
| ) |
Check notice
Code scanning / CodeQL
Unused import
|
|
||
| from nemo_text_processing.inverse_text_normalization.ko.graph_utils import GraphFst, delete_space, generator_main | ||
| from nemo_text_processing.inverse_text_normalization.ko.verbalizers.verbalize import VerbalizeFst | ||
| from nemo_text_processing.inverse_text_normalization.ko.verbalizers.word import WordFst |
Check notice
Code scanning / CodeQL
Unused import
| # limitations under the License. | ||
|
|
||
|
|
||
| import pynini |
Check notice
Code scanning / CodeQL
Unused import
| import pynini | ||
| from pynini.lib import pynutil | ||
|
|
||
| from nemo_text_processing.inverse_text_normalization.ko.graph_utils import NEMO_NOT_QUOTE, GraphFst, delete_space |
Check notice
Code scanning / CodeQL
Unused import
| from parameterized import parameterized | ||
|
|
||
| from nemo_text_processing.inverse_text_normalization.inverse_normalize import InverseNormalizer | ||
| from nemo_text_processing.text_normalization.normalize import Normalizer |
Check notice
Code scanning / CodeQL
Unused import
Sparrowhawk testing is not done yet. Signed-off-by: hmlee245 <hmlee245@gmail.com>
d8e057d to
fa304a2
Compare
for more information, see https://pre-commit.ci
Haven't fixed the graph@leading zero part
…245/NeMo-text-processing into Draft/-Korean-Cardinal-ITN
for more information, see https://pre-commit.ci
…245/NeMo-text-processing into Draft/-Korean-Cardinal-ITN Signed-off-by: hmlee245 <hmlee245@gmail.com>
a2996ba to
2c832f5
Compare
…245/NeMo-text-processing into Draft/-Korean-Cardinal-ITN Signed-off-by: hmlee245 <hmlee245@gmail.com>
…245/NeMo-text-processing into Draft/-Korean-Cardinal-ITN
…245/NeMo-text-processing into Draft/-Korean-Cardinal-ITN Signed-off-by: hmlee245 <hmlee245@gmail.com>
…245/NeMo-text-processing into Draft/-Korean-Cardinal-ITN
Fixes issue with sparrowhawk builds as the original base image is no longer maintained and build breaks Signed-off-by: anand-nv <105917641+anand-nv@users.noreply.github.com> Signed-off-by: hmlee245 <hmlee245@gmail.com>
* Future implementations to date.py - Hindi ITN (NVIDIA#265) * Addition of whitelist and word classes Signed-off-by: Tarushi V <tarushiv@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updation of Jenkins date Signed-off-by: Tarushi V <tarushiv@nvidia.com> * Cleanup Signed-off-by: Tarushi V <tarushiv@nvidia.com> * Updation Signed-off-by: Tarushi V <tarushiv@nvidia.com> * Updation Signed-off-by: Tarushi V <tarushiv@nvidia.com> * Future implementations for date Signed-off-by: Tarushi V <tarushiv@nvidia.com> * pushing rough date code for ref Signed-off-by: Tarushi V <tarushiv@nvidia.com> * Future implementations date.py Signed-off-by: Tarushi V <tarushiv@nvidia.com> * Cleanup Signed-off-by: Tarushi V <tarushiv@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updation of Jenkinsfile Signed-off-by: Tarushi V <tarushiv@nvidia.com> * Telephone.py-hindi itn Signed-off-by: Tarushi V <tarushiv@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Telephone.py - Hindi ITN Signed-off-by: Tarushi V <tarushiv@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Telephone modified tagger and verbalizer Signed-off-by: Tarushi V <tarushiv@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * telephone tagger with 3,4,5 digit std codes Signed-off-by: Tarushi V <tarushiv@nvidia.com> * Further additions - telephone.py Signed-off-by: Tarushi V <tarushiv@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Jenkins update Signed-off-by: Tarushi V <tarushiv@nvidia.com> * Telephone.py Signed-off-by: Tarushi V <tarushiv@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Updated tagger-telephone.py Signed-off-by: Tarushi V <tarushiv@nvidia.com> * Telephone and Jenkinsfile cleanup Signed-off-by: Tarushi V <tarushiv@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update Jenkins Signed-off-by: Tarushi V <tarushiv@nvidia.com> --------- Signed-off-by: Tarushi V <tarushiv@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add missing __init__.py file Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Tarushi V <tarushiv@nvidia.com> Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: tarushi2k2 <tarushiv@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: hmlee245 <hmlee245@gmail.com>
* add base coverage for fr tn date * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: hmlee245 <hmlee245@gmail.com>
* [pre-commit.ci] pre-commit suggestions updates: - [github.com/pre-commit/pre-commit-hooks: v4.3.0 → v5.0.0](pre-commit/pre-commit-hooks@v4.3.0...v5.0.0) - [github.com/PyCQA/flake8: 7.1.1 → 7.2.0](PyCQA/flake8@7.1.1...7.2.0) - [github.com/PyCQA/isort: 5.12.0 → 6.0.1](PyCQA/isort@5.12.0...6.0.1) - [github.com/psf/black: 19.10b0 → 25.1.0](psf/black@19.10b0...25.1.0) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Signed-off-by: hmlee245 <hmlee245@gmail.com>
Sparrowhawk testing is not done yet. Signed-off-by: hmlee245 <hmlee245@gmail.com>
Haven't fixed the graph@leading zero part Signed-off-by: hmlee245 <hmlee245@gmail.com>
for more information, see https://pre-commit.ci Signed-off-by: hmlee245 <hmlee245@gmail.com>
Signed-off-by: hmlee245 <hmlee245@gmail.com>
for more information, see https://pre-commit.ci Signed-off-by: hmlee245 <hmlee245@gmail.com>
…245/NeMo-text-processing into Draft/-Korean-Cardinal-ITN
Sparrowhawk testing is not done yet.
What does this PR do ?
This draft PR is for checking the tagger ITN cardinal code and other files.
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
Draft
If you haven't finished some of the above items you can still open "Draft" PR.