Skip to content

Commit 2c22d73

Browse files
PR: Add Vietnamese text normalization for cardinal semiotic class (#289)
* Add Vietnamese text normalization for cardinal semiotic class Signed-off-by: folivoramanh <palasek182@gmail.com> * Add missing init file Signed-off-by: folivoramanh <palasek182@gmail.com> * Fix Cardinal and optimize logic Signed-off-by: folivoramanh <palasek182@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: folivoramanh <palasek182@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent ac07488 commit 2c22d73

30 files changed

+1034
-17
lines changed

Jenkinsfile

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,7 @@ pipeline {
176176
}
177177
}
178178

179-
stage('L0: Create FR TN/ITN & VI ITN & HU TN & IT TN') {
179+
stage('L0: Create FR TN/ITN & VI TN/ITN & HU TN & IT TN') {
180180
when {
181181
anyOf {
182182
branch 'main'
@@ -200,6 +200,11 @@ pipeline {
200200
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/inverse_text_normalization/inverse_normalize.py --lang=vi --text="một ngàn " --cache_dir ${VI_TN_CACHE}'
201201
}
202202
}
203+
stage('L0: VI TN grammars') {
204+
steps {
205+
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=vi --text="100" --cache_dir ${VI_TN_CACHE}'
206+
}
207+
}
203208
stage('L0: HU TN grammars') {
204209
steps {
205210
sh 'CUDA_VISIBLE_DEVICES="" python nemo_text_processing/text_normalization/normalize.py --lang=hu --text="100" --cache_dir ${HU_TN_CACHE}'

nemo_text_processing/text_normalization/normalize.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,9 @@ def __init__(
174174
elif lang == 'ja':
175175
from nemo_text_processing.text_normalization.ja.taggers.tokenize_and_classify import ClassifyFst
176176
from nemo_text_processing.text_normalization.ja.verbalizers.verbalize_final import VerbalizeFinalFst
177+
elif lang == 'vi':
178+
from nemo_text_processing.text_normalization.vi.taggers.tokenize_and_classify import ClassifyFst
179+
from nemo_text_processing.text_normalization.vi.verbalizers.verbalize_final import VerbalizeFinalFst
177180
else:
178181
raise NotImplementedError(f"Language {lang} has not been supported yet.")
179182

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (c) 2021, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
1 một
2+
2 hai
3+
3 ba
4+
4 bốn
5+
5 năm
6+
6 sáu
7+
7 bảy
8+
8 tám
9+
9 chín
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
1 một mốt
2+
4 bốn tư
3+
5 năm lăm
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
thousand nghìn
2+
million triệu
3+
billion tỷ
4+
hundred trăm
5+
linh linh
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
10 mười
2+
11 mười một
3+
12 mười hai
4+
13 mười ba
5+
14 mười bốn
6+
15 mười lăm
7+
16 mười sáu
8+
17 mười bảy
9+
18 mười tám
10+
19 mười chín
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
2 hai mươi
2+
3 ba mươi
3+
4 bốn mươi
4+
5 năm mươi
5+
6 sáu mươi
6+
7 bảy mươi
7+
8 tám mươi
8+
9 chín mươi

0 commit comments

Comments
 (0)