Skip to content

Commit cb53beb

Browse files
En names (#42)
* Add support for Financial year and for years between 1000 BC and 1000AD Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add support for product names and add abbreviations to whitelist Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add weights for some sequences, exclude 'a' before numeric sequence Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Add tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update cache folder for EN Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update FR Cache path Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Move text to TSV files, and some code cleanup Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add additional vocabulary, allow singular usage of units to support adjective phrases Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Fix issue with whitelist loader not handling weights correctly Move cased loader file to graph_utils Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * insert space between value and unit Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Insert space between measurement and unit. Adjust weight for ordinal Signed-off-by: Anand Joseph <anajoseph@nvidia.com> * Update tests Signed-off-by: Anand Joseph <anajoseph@nvidia.com> --------- Signed-off-by: Anand Joseph <anajoseph@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
1 parent f806134 commit cb53beb

18 files changed

Lines changed: 625 additions & 89 deletions

File tree

Jenkinsfile

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,10 @@ pipeline {
1010
disableConcurrentBuilds(abortPrevious: true)
1111
}
1212
environment {
13+
1314
AR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-15-23-0'
1415
DE_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-15-23-0'
15-
EN_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-16-23-0'
16+
EN_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-18-23-1'
1617
ES_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-15-23-0'
1718
FR_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-16-23-1'
1819
PT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-15-23-0'
@@ -21,6 +22,7 @@ pipeline {
2122
SV_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-15-23-0'
2223
ZH_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-15-23-0'
2324
DEFAULT_TN_CACHE='/home/jenkinsci/TestData/text_norm/ci/grammars/02-15-23-0'
25+
2426
}
2527
stages {
2628

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
Q1 first quarter
2+
Q1 q one
3+
Q2 second quarter
4+
Q2 q two
5+
Q3 third quarter
6+
Q3 q three
7+
Q4 fourth quarter
8+
Q4 q four
9+
H1 first half
10+
H2 second half

nemo_text_processing/inverse_text_normalization/en/data/measurements.tsv

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,14 @@ km² square kilometer
1414
ft foot
1515
% percent
1616
hz hertz
17-
kw kilowatt
17+
kW kilowatt
18+
kW kilo watt
19+
kWh kilo watt hour
20+
kWh kilowatt hour
21+
Wh watt hour
22+
W watt
1823
hp horsepower
24+
hp horse power
1925
mg milligram
2026
kg kilogram
2127
ghz gigahertz
@@ -30,12 +36,12 @@ rpm revolution per minute
3036
min minute
3137
mA milli ampere
3238
% per cent
33-
kwh kilo watt hour
3439
cubic meter
3540
mph mile per hour
36-
tw tera watt
41+
tW tera watt
3742
mv milli volt
38-
mw megawatt
43+
mW megawatt
44+
mW mega watt
3945
μm micrometer
4046
" inch
4147
cc c c
@@ -86,6 +92,7 @@ kl kilo liter
8692
tj tera joule
8793
kv kilo volt
8894
mv mega volt
95+
kn kilo newton
8996
kn kilonewton
9097
mm megameter
9198
au astronomical unit
@@ -96,8 +103,12 @@ hs hecto second
96103
mol mole
97104
gpa giga pascal
98105
ml milliliter
99-
gw gigawatt
100-
ma mega ampere
106+
gW gigawatt
107+
gW gigaWatt
108+
A ampere
109+
mA mili ampere
110+
µA micro ampere
111+
MA mega ampere
101112
kt knot
102113
kgf kilogram force
103114
ng nano gram
@@ -106,7 +117,7 @@ ms mega siemens
106117
bar bar
107118
gl giga liter
108119
μs microsecond
109-
da deci ampere
120+
dA deci ampere
110121
pa pascal
111122
ds deci second
112123
ms milli second
@@ -126,7 +137,7 @@ tl tera liter
126137
ms mega second
127138
mpa megapascal
128139
pm peta meter
129-
gwh giga watt hour
140+
gWh giga watt hour
130141
kcal kilo calory
131142
gy gray
132143
sv sievert

0 commit comments

Comments
 (0)