Skip to content

Commit e29defa

Browse files
Zh tn 0712 (#89)
* updates Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates and fixings according to document on natonal gideline Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * Decimal grammar added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * fraction updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * money updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * ordinal grammar added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * punctuation grammar added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * time gramamr updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * tokenizaer updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates on certificate Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * data updated and added due to updates and chanegs to the existing grammar Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * cardinal updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * date grammar changed Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * decimal grammar added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * grammar updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * grammar updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * grammar added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * grammar updates Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * test data added Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * test python file edits Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updates for tn1.0 and previous tn grammar from contribution Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * test cases updated Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * coding style fixed Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * dates updated for init files Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated the date for zh Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed unsed imports Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * removed comments Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added back the itn tests Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * added back measure and math from previou TN Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updated for tests reruns Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * updats Signed-off-by: BuyuanCui <alexcui1994@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated weights Signed-off-by: BuyuanCui <alexcui1994@gmail.com> --------- Signed-off-by: BuyuanCui <alexcui1994@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent 1138f78 commit e29defa

65 files changed

Lines changed: 2943 additions & 718 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

Jenkinsfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ pipeline {
2222
RU_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
2323
VI_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
2424
SV_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
25-
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-29-23-0'
25+
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/07-12-23-0'
2626
DEFAULT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/06-08-23-0'
2727

2828
}

nemo_text_processing/text_normalization/zh/data/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# Licensed under the Apache License, Version 2.0 (the "License");
44
# you may not use this file except in compliance with the License.

nemo_text_processing/text_normalization/zh/data/date/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# Licensed under the Apache License, Version 2.0 (the "License");
44
# you may not use this file except in compliance with the License.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
1
2+
2
3+
3
4+
4
5+
5
6+
6
7+
7
8+
8
9+
9
10+
01
11+
02
12+
03
13+
04
14+
05
15+
06
16+
07
17+
08
18+
09
19+
10
20+
11 十一
21+
12 十二
22+
13 十三
23+
14 十四
24+
15 十五
25+
16 十六
26+
17 十七
27+
18 十八
28+
19 十九
29+
20 二十
30+
21 二十一
31+
22 二十二
32+
23 二十三
33+
24 二十四
34+
25 二十五
35+
26 二十六
36+
27 二十七
37+
28 二十八
38+
29 二十九
39+
30 三十
40+
31 三十一
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
1
2+
2
3+
3
4+
4
5+
5
6+
6
7+
7
8+
8
9+
9
10+
10
11+
11 十一
12+
12 十二
13+
01
14+
02
15+
03
16+
04
17+
05
18+
06
19+
07
20+
08
21+
09
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
ad 公元
2+
AD 公元
3+
a.d. 公元
4+
A.D. 公元
5+
ce 公元
6+
CE 公元
7+
c.e. 公元
8+
C.E. 公元
9+
bc 公元前
10+
BC 公元前
11+
b.c. 公元前
12+
B.C. 公元前
13+
bce 公元前
14+
BCE 公元前
15+
b.c.e. 公元前
16+
B.C.E. 公元前
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
ad 公元
2+
AD 公元
3+
a.d. 公元
4+
A.D. 公元
5+
ce 公元
6+
CE 公元
7+
c.e. 公元
8+
C.E. 公元
9+
bc 公元前
10+
BC 公元前
11+
b.c. 公元前
12+
B.C. 公元前
13+
bce 公元前
14+
BCE 公元前
15+
b.c.e. 公元前
16+
B.C.E. 公元前

nemo_text_processing/text_normalization/zh/data/date/year_suffix.tsv

Lines changed: 0 additions & 6 deletions
This file was deleted.

nemo_text_processing/text_normalization/zh/data/measure/units_en.tsv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,3 +88,5 @@ mw 毫瓦
8888
pg 皮克
8989
ps 皮秒
9090
s
91+
ms 毫秒
92+
g

nemo_text_processing/text_normalization/zh/data/money/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
1+
# Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
#
33
# Licensed under the Apache License, Version 2.0 (the "License");
44
# you may not use this file except in compliance with the License.

0 commit comments

Comments
 (0)