Skip to content

Commit 4846bcc

Browse files
tbartley94pre-commit-ci[bot]
authored andcommitted
Ko itn staging v2 (#400)
* moving korean staging branch due to dco issues Signed-off-by: tbartley94 <tbartley@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * import cleanup Signed-off-by: tbartley94 <tbartley@nvidia.com> * missing jenkins Signed-off-by: tbartley94 <tbartley@nvidia.com> * missing jenkins Signed-off-by: tbartley94 <tbartley@nvidia.com> * migrating merge branch to main Signed-off-by: tbartley94 <tbartley@nvidia.com> * removing fars Signed-off-by: tbartley94 <tbartley@nvidia.com> * import issues Signed-off-by: tbartley94 <tbartley@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * adding language tag Signed-off-by: tbartley94 <tbartley@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: tbartley94 <tbartley@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 4a3b83b commit 4846bcc

145 files changed

Lines changed: 7943 additions & 1006 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Jenkinsfile

Lines changed: 86 additions & 136 deletions
Large diffs are not rendered by default.

nemo_text_processing/inverse_text_normalization/inverse_normalize.py

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,13 @@ def __init__(
136136
from nemo_text_processing.inverse_text_normalization.he.verbalizers.verbalize_final import (
137137
VerbalizeFinalFst,
138138
)
139+
elif lang == 'ko': # Korean
140+
from nemo_text_processing.inverse_text_normalization.ko.taggers.tokenize_and_classify import ClassifyFst
141+
from nemo_text_processing.inverse_text_normalization.ko.verbalizers.verbalize_final import (
142+
VerbalizeFinalFst,
143+
)
144+
else:
145+
raise NotImplementedError(f"Language {lang} has not been supported yet.")
139146

140147
self.tagger = ClassifyFst(
141148
cache_dir=cache_dir, whitelist=whitelist, overwrite_cache=overwrite_cache, input_case=input_case
@@ -180,7 +187,25 @@ def parse_args():
180187
parser.add_argument(
181188
"--language",
182189
help="language",
183-
choices=['en', 'de', 'es', 'pt', 'ru', 'fr', 'sv', 'vi', 'ar', 'es_en', 'zh', 'he', 'hi', 'hy', 'mr', 'ja'],
190+
choices=[
191+
'en',
192+
'de',
193+
'es',
194+
'pt',
195+
'ru',
196+
'fr',
197+
'sv',
198+
'vi',
199+
'ar',
200+
'es_en',
201+
'zh',
202+
'he',
203+
'hi',
204+
'hy',
205+
'mr',
206+
'ja',
207+
'ko',
208+
],
184209
default="en",
185210
type=str,
186211
)
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
from nemo_text_processing.inverse_text_normalization.ko.taggers.tokenize_and_classify import ClassifyFst
16+
from nemo_text_processing.inverse_text_normalization.ko.verbalizers.verbalize import VerbalizeFst
17+
from nemo_text_processing.inverse_text_normalization.ko.verbalizers.verbalize_final import VerbalizeFinalFst
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
달러 $
2+
$
3+
유로
4+
¥
5+
파운드 £
6+
위안 ¥
7+
페소 $
8+
루피
9+
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
킬로미터 km
2+
미터 m
3+
센티미터 cm
4+
밀리미터 mm
5+
마이크로미터 μm
6+
나노미터 nm
7+
킬로그램 kg
8+
그램 g
9+
t
10+
밀리그램 mg
11+
마이크로그램 μg
12+
리터 L
13+
밀리리터 ml
14+
씨씨 cc
15+
시간 h
16+
min
17+
s
18+
뉴턴 N
19+
와트 W
20+
킬로와트 kW
21+
킬로와트시 kWh
22+
헤르츠 Hz
23+
킬로헤르츠 kHz
24+
메가헤르츠 MHz
25+
기가헤르츠 GHz
26+
°
27+
퍼센트 %
28+
프로 %
29+
분당회전수 rpm
30+
알피엠 rpm
31+
볼트 V
32+
밀리볼트 mV
33+
킬로볼트 kV
34+
암페어 A
35+
밀리암페어 mA
36+
py
37+
제곱미터
38+
제곱킬로미터 km²
39+
제곱센티미터 cm²
40+
세제곱미터
41+
기가바이트 GB
42+
기가 GB
43+
테라바이트 TB
44+
테라 TB
45+
메가바이트 MB
46+
메가 MB
47+
킬로바이트 KB
48+
바이트 B
49+
비트 bit
50+
칼로리 cal
51+
킬로칼로리 kcal
52+
J
53+
킬로줄 kJ
54+
마력 hp
55+
Ω
56+
파스칼 Pa
57+
헥토파스칼 hPa
58+
데시벨 dB
59+
루멘 lm
60+
럭스 lx
61+
픽셀 px
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
1
2+
2
3+
3
4+
4
5+
5
6+
6
7+
7
8+
8
9+
9
10+
10
11+
십일 11
12+
십이 12
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
1
2+
2
3+
3
4+
4
5+
5
6+
6
7+
7
8+
8
9+
9
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
0

0 commit comments

Comments
 (0)