Skip to content

Commit 6e5601d

Browse files
mgrafufolivoramanhpre-commit-ci[bot]
authored
Staging pt_br TN to main (#421)
* Add Portuguese (PT) text normalization: cardinal, ordinal, decimal, fraction (#403) * Add Portuguese (PT) text normalization: cardinal, ordinal, decimal, fraction Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <palasek182@gmail.com> * date and time semiotic classese Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Mai Anh <palasek182@gmail.com> * update sh files Signed-off-by: Mai Anh <palasek182@gmail.com> * Update Portuguese text normalization tutorial with enhanced examples and outputs - Changed the language parameter in the Normalizer instance from 'en' to 'pt'. - Added detailed output examples for the normalizer's methods, including documentation for `__doc__` and `normalize()`. - Updated example input string to reflect a more complex Portuguese sentence for normalization. - Adjusted execution counts for code cells to ensure proper order of execution. This update aims to improve the clarity and usability of the tutorial for Portuguese text normalization. Signed-off-by: Mai Anh <palasek182@gmail.com> * remove current unuse file Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update minor update and punct Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * PT TN: money, measure, telephone, electronic (#416) * PT TN: money, measure, telephone, electronic Adds semiotic classes and tests on top of staging/pt-br_tn; includes cardinal fix for X00 + 01–09 and Sparrowhawk script updates. Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bugs based on review cardinal, fraction, money, measure Signed-off-by: Mai Anh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * modify test case time Signed-off-by: Mai Anh <palasek182@gmail.com> * modify with mariana's review Signed-off-by: Mai Anh <palasek182@gmail.com> --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * update jenkins cache Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> * add init to whitelist Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> --------- Signed-off-by: Mai Anh <palasek182@gmail.com> Signed-off-by: Mariana Graterol Fuenmayor <marianag@nvidia.com> Signed-off-by: Mariana <47233618+mgrafu@users.noreply.github.com> Co-authored-by: Mai Anh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 1a38df0 commit 6e5601d

107 files changed

Lines changed: 4647 additions & 17 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Jenkinsfile

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ pipeline {
1717
ES_EN_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/08-30-24-0'
1818
FR_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/04-07-25-0'
1919
HU_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/07-16-24-0'
20-
PT_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/06-08-23-0'
20+
PT_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/05-01-26-1'
2121
RU_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/06-08-23-0'
2222
VI_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/10-29-25-0'
2323
SV_TN_CACHE='/home/jenkins/TestData/text_norm/ci/grammars/06-08-23-0'
@@ -242,16 +242,16 @@ pipeline {
242242
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=sv --text="100" --cache_dir ${SV_TN_CACHE}'
243243
}
244244
}
245-
// stage('L0: SV ITN grammars') {
246-
// steps {
247-
// sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=sv --text="hundra " --cache_dir ${SV_TN_CACHE}'
248-
// }
249-
// }
250-
// stage('L0: PT TN grammars') {
251-
// steps {
252-
// sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=pt --text="2" --cache_dir ${DEFAULT_TN_CACHE}'
253-
// }
254-
// }
245+
// stage('L0: SV ITN grammars') {
246+
// steps {
247+
// sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=sv --text="hundra " --cache_dir ${SV_TN_CACHE}'
248+
// }
249+
// }
250+
stage('L0: PT TN grammars') {
251+
steps {
252+
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=pt --text="2" --cache_dir ${PT_TN_CACHE}'
253+
}
254+
}
255255
stage('L0: PT ITN grammars') {
256256
steps {
257257
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=pt --text="dez " --cache_dir ${PT_TN_CACHE}'

nemo_text_processing/text_normalization/normalize.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,6 +185,9 @@ def __init__(
185185

186186
if post_process:
187187
self.post_processor = PostProcessingFst(cache_dir=cache_dir, overwrite_cache=overwrite_cache)
188+
elif lang == 'pt':
189+
from nemo_text_processing.text_normalization.pt.taggers.tokenize_and_classify import ClassifyFst
190+
from nemo_text_processing.text_normalization.pt.verbalizers.verbalize_final import VerbalizeFinalFst
188191
elif lang == 'ko':
189192
from nemo_text_processing.text_normalization.ko.taggers.tokenize_and_classify import ClassifyFst
190193
from nemo_text_processing.text_normalization.ko.verbalizers.verbalize_final import VerbalizeFinalFst
@@ -734,7 +737,7 @@ def parse_args():
734737
parser.add_argument(
735738
"--language",
736739
help="language",
737-
choices=["en", "de", "es", "fr", "hu", "sv", "zh", "ar", "it", "hy", "ja", "hi", "ko", "vi"],
740+
choices=["en", "de", "es", "fr", "hu", "sv", "zh", "ar", "it", "hy", "ja", "hi", "ko", "vi", "pt"],
738741
default="en",
739742
type=str,
740743
)
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use it except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
1 janeiro
2+
01 janeiro
3+
2 fevereiro
4+
02 fevereiro
5+
3 março
6+
03 março
7+
4 abril
8+
04 abril
9+
5 maio
10+
05 maio
11+
6 junho
12+
06 junho
13+
7 julho
14+
07 julho
15+
8 agosto
16+
08 agosto
17+
9 setembro
18+
09 setembro
19+
10 outubro
20+
11 novembro
21+
12 dezembro
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
/
2+
.
3+
-
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
preposition de
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
.com ponto com
2+
.com.br ponto com ponto br
3+
.gov.br ponto gov ponto br
4+
.org ponto org
5+
.net ponto net
6+
.edu ponto edu
7+
.br ponto br

0 commit comments

Comments
 (0)