Skip to content

Commit 3de4021

Browse files
mgrafungachchipre-commit-ci[bot]github-advanced-security[bot]
committed
Staging hi tn (NVIDIA#271)
* Future Implementations for classes - Measure, Money, and Date (NVIDIA#258) * Future Implementations for classes - Measure, Money, and Date Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * Resolved the conflicts with mm_yyyy and date ranges and added the previously removed failing test cases. Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed the unused empty string implementation Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * minor fixes for the tagger files Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reformatted decimal final graph Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * incorporated the suggestion for decimal graph Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Century implementations Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * Working on the yyyy format for the date class Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * reverted yyyy code Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * working on future implementations Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * working on improving the date class accuracy Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added year prefix for the date class Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * working on the commma cases for date class Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * minor fixes Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * implemented mixed fractions Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * rectified the test case Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * working on quarterly measurements Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * reformatted the prefixes and suffixes for date tagger class Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * replaced text tag with era tag for the date class Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> * Removed the text tag reference from date class verbalizer Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> --------- Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update jenkins cache Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Potential fix for code scanning alert no. 821: Unused local variable Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> --------- Signed-off-by: Namrata Gachchi <ngachchi@nvidia.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Namrata Gachchi <ngachchi@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
1 parent cfc9627 commit 3de4021

File tree

4 files changed

+21
-6
lines changed

4 files changed

+21
-6
lines changed

Jenkinsfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ pipeline {
2727
HY_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-0'
2828
MR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/03-12-24-1'
2929
JA_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/10-17-24-1'
30-
HI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-12-25-0'
30+
HI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/04-22-25-0'
3131
DEFAULT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
3232
}
3333
stages {

nemo_text_processing/text_normalization/hi/taggers/measure.py

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,15 +62,27 @@ def __init__(self, cardinal: GraphFst, decimal: GraphFst):
6262
)
6363

6464
# Define the quarterly measurements
65-
quarter = pynini.string_map([(".५", "साढ़े"), ("१.५", "डेढ़"), ("२.५", "ढाई"),])
65+
quarter = pynini.string_map(
66+
[
67+
(".५", "साढ़े"),
68+
("१.५", "डेढ़"),
69+
("२.५", "ढाई"),
70+
]
71+
)
6672
quarter_graph = pynutil.insert("integer_part: \"") + quarter + pynutil.insert("\"")
6773

6874
# Define the unit handling
6975
unit = pynutil.insert(" units: \"") + unit_graph + pynutil.insert("\" ")
7076
units = pynutil.insert(" units: \"") + quarterly_units_graph + pynutil.insert("\" ")
7177

7278
# Handling symbols like x, X, *
73-
symbol_graph = pynini.string_map([("x", "बाई"), ("X", "बाई"), ("*", "बाई"),])
79+
symbol_graph = pynini.string_map(
80+
[
81+
("x", "बाई"),
82+
("X", "बाई"),
83+
("*", "बाई"),
84+
]
85+
)
7486

7587
graph_decimal = (
7688
pynutil.insert("decimal { ")

nemo_text_processing/text_normalization/hi/taggers/money.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,10 @@
2424
class MoneyFst(GraphFst):
2525
"""
2626
Finite state transducer for classifying money, suppletive aware, e.g.
27-
₹1 -> money { currency: "रुपए" integer_part: "एक" }
28-
₹1.2 -> money { currency: "रुपए" integer_part: "एक" fractional_part: "दो" }
27+
₹५० -> money { money { currency_maj: "रुपए" integer_part: "पचास" }
28+
₹५०.५० -> money { currency_maj: "रुपए" integer_part: "पचास" fractional_part: "पचास" currency_min: "centiles" }
29+
₹०.५० -> money { currency_maj: "रुपए" integer_part: "शून्य" fractional_part: "पचास" currency_min: "centiles" }
30+
Note that the 'centiles' string is a placeholder to handle by the verbalizer by applying the corresponding minor currency denomination
2931
3032
Args:
3133
cardinal: CardinalFst

nemo_text_processing/text_normalization/hi/taggers/tokenize_and_classify.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,8 @@ def __init__(
6969
os.makedirs(cache_dir, exist_ok=True)
7070
whitelist_file = os.path.basename(whitelist) if whitelist else ""
7171
far_file = os.path.join(
72-
cache_dir, f"hi_tn_{deterministic}_deterministic_{input_case}_{whitelist_file}_tokenize.far",
72+
cache_dir,
73+
f"hi_tn_{deterministic}_deterministic_{input_case}_{whitelist_file}_tokenize.far",
7374
)
7475
if not overwrite_cache and far_file and os.path.exists(far_file):
7576
self.fst = pynini.Far(far_file, mode="r")["tokenize_and_classify"]

0 commit comments

Comments
 (0)