Skip to content

Commit 729a5ba

Browse files
IWSLT-Ta ASR/ST (#1362)
This is a pull request for Dialectal IWSLT-Tunisian 2022 shared task https://iwslt.org/2022/dialect ASR and ST recipes.
1 parent 855536d commit 729a5ba

127 files changed

Lines changed: 28855 additions & 1 deletion

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

egs/iwslt22_ta/ASR/README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# IWSLT_Ta
2+
3+
The IWSLT Tunisian dataset is a 3-way parallel dataset consisting of approximately 160 hours
4+
and 200,000 lines of aligned audio, Tunisian transcripts, and English translations. This dataset
5+
comprises conversational telephone speech recorded at a sampling rate of 8kHz. The train, dev,
6+
and test1 splits of the iwslt2022 shared task correspond to catalog number LDC2022E01. Please
7+
note that access to this data requires an LDC subscription from your institution.To obtain this
8+
dataset, you should download the predefined splits by running the following command:
9+
git clone https://github.com/kevinduh/iwslt22-dialect.git. For more detailed information about
10+
the shared task, please refer to the task paper available at this link:
11+
https://aclanthology.org/2022.iwslt-1.10/.
12+
13+
## Stateless Pruned Transducer Performance Record (after 20 epochs)
14+
15+
| Decoding method | dev WER | test WER | comment |
16+
|------------------------------------|------------|------------|------------------------------------------|
17+
| modified beam search | 47.6 | 51.2 | --epoch 20, --avg 10 |
18+
19+
## Zipformer Performance Record (after 20 epochs)
20+
21+
| Decoding method | dev WER | test WER | comment |
22+
|------------------------------------|------------|------------|------------------------------------------|
23+
| modified beam search | 40.8 | 44.4 | --epoch 20, --avg 10 |
24+
25+
26+
See [RESULTS](RESULTS.md) for details.

egs/iwslt22_ta/ASR/RESULTS.md

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# Results
2+
3+
4+
5+
### IWSLT Tunisian training results (Stateless Pruned Transducer)
6+
7+
#### 2023-06-01
8+
9+
10+
| Decoding method | dev WER | test WER | comment |
11+
|------------------------------------|------------|------------|------------------------------------------|
12+
| modified beam search | 47.6 | 51.2 | --epoch 20, --avg 13 |
13+
14+
The training command for reproducing is given below:
15+
16+
```
17+
export CUDA_VISIBLE_DEVICES="0,1,2,3"
18+
19+
20+
21+
./pruned_transducer_stateless5/train.py \
22+
--world-size 4 \
23+
--num-epochs 20 \
24+
--start-epoch 1 \
25+
--exp-dir pruned_transducer_stateless5/exp \
26+
--max-duration 300 \
27+
--num-buckets 50
28+
```
29+
30+
The tensorboard training log can be found at
31+
https://tensorboard.dev/experiment/yBijWJSPSGuBqMwTZ509lA/
32+
33+
The decoding command is:
34+
```
35+
for method in modified_beam_search; do
36+
./pruned_transducer_stateless5/decode.py \
37+
--epoch 15 \
38+
--beam-size 20 \
39+
--avg 5 \
40+
--exp-dir ./pruned_transducer_stateless5/exp \
41+
--max-duration 400 \
42+
--decoding-method modified_beam_search \
43+
--max-sym-per-frame 1 \
44+
--num-encoder-layers 12 \
45+
--dim-feedforward 1024 \
46+
--nhead 8 \
47+
--encoder-dim 256 \
48+
--decoder-dim 256 \
49+
--joiner-dim 256 \
50+
--use-averaged-model true
51+
done
52+
```
53+
54+
### IWSLT Tunisian training results (Zipformer)
55+
56+
#### 2023-06-01
57+
58+
You can find a pretrained model, training logs, decoding logs, and decoding results at:
59+
<https://huggingface.co/AmirHussein/zipformer-iwslt22-Ta>
60+
61+
62+
63+
| Decoding method | dev WER | test WER | comment |
64+
|------------------------------------|------------|------------|------------------------------------------|
65+
| modified beam search | 40.8 | 44.1 | --epoch 20, --avg 13 |
66+
67+
To reproduce the above result, use the following commands for training:
68+
69+
# Note: the model was trained on V-100 32GB GPU
70+
71+
```
72+
export CUDA_VISIBLE_DEVICES="0,1"
73+
./zipformer/train.py \
74+
--world-size 4 \
75+
--num-epochs 20 \
76+
--start-epoch 1 \
77+
--use-fp16 1 \
78+
--exp-dir zipformer/exp \
79+
--causal 0 \
80+
--num-encoder-layers 2,2,2,2,2,2 \
81+
--feedforward-dim 512,768,1024,1536,1024,768 \
82+
--encoder-dim 192,256,384,512,384,256 \
83+
--encoder-unmasked-dim 192,192,256,256,256,192 \
84+
--max-duration 800 \
85+
--prune-range 10
86+
87+
```
88+
89+
The decoding command is:
90+
91+
```
92+
for method in modified_beam_search; do
93+
./zipformer/decode.py \
94+
--epoch 20 \
95+
--beam-size 20 \
96+
--avg 13 \
97+
--exp-dir ./zipformer/exp\
98+
--max-duration 800 \
99+
--decoding-method $method \
100+
--num-encoder-layers 2,2,2,2,2,2 \
101+
--feedforward-dim 512,768,1024,1536,1024,768 \
102+
--encoder-dim 192,256,384,512,384,256 \
103+
--encoder-unmasked-dim 192,192,256,256,256,192 \
104+
--use-averaged-model true
105+
done
106+
```
107+
108+
109+
110+

egs/iwslt22_ta/ASR/local/__init__.py

Whitespace-only changes.

egs/iwslt22_ta/ASR/local/cer.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../ST/local/cer.py
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
#!/usr/bin/env python3
2+
# Johns Hopkins University (authors: Amir Hussein)
3+
#
4+
# See ../../../../LICENSE for clarification regarding multiple authors
5+
#
6+
# Licensed under the Apache License, Version 2.0 (the "License");
7+
# you may not use this file except in compliance with the License.
8+
# You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
19+
"""
20+
This file computes fbank features of the MGB2 dataset.
21+
It looks for manifests in the directory data/manifests.
22+
23+
The generated fbank features are saved in data/fbank.
24+
"""
25+
26+
import logging
27+
import os
28+
from pathlib import Path
29+
import argparse
30+
31+
import torch
32+
from lhotse import CutSet, Fbank, FbankConfig, LilcomChunkyWriter
33+
from lhotse.recipes.utils import read_manifests_if_cached
34+
35+
from icefall.utils import get_executor
36+
37+
from lhotse.features.kaldifeat import (
38+
KaldifeatFbank,
39+
KaldifeatFbankConfig,
40+
KaldifeatFrameOptions,
41+
KaldifeatMelOptions,
42+
)
43+
44+
# Torch's multithreaded behavior needs to be disabled or
45+
# it wastes a lot of CPU and slow things down.
46+
# Do this outside of main() in case it needs to take effect
47+
# even when we are not invoking the main (e.g. when spawning subprocesses).
48+
49+
def get_args():
50+
parser = argparse.ArgumentParser()
51+
parser.add_argument(
52+
"--num-splits",
53+
type=int,
54+
default=20,
55+
help="Number of splits for the train set.",
56+
)
57+
parser.add_argument(
58+
"--start",
59+
type=int,
60+
default=0,
61+
help="Start index of the train set split.",
62+
)
63+
parser.add_argument(
64+
"--stop",
65+
type=int,
66+
default=-1,
67+
help="Stop index of the train set split.",
68+
)
69+
parser.add_argument(
70+
"--test",
71+
action="store_true",
72+
help="If set, only compute features for the dev and val set.",
73+
)
74+
75+
return parser.parse_args()
76+
77+
78+
def compute_fbank_gpu(args):
79+
src_dir = Path("data/manifests")
80+
output_dir = Path("data/fbank")
81+
output_dir.mkdir(parents=True, exist_ok=True)
82+
num_jobs = os.cpu_count()
83+
num_mel_bins = 80
84+
sampling_rate = 16000
85+
sr = 16000
86+
87+
dataset_parts = ("dev", "test1") if args.test else ("train", "test1", "dev")
88+
manifests = read_manifests_if_cached(
89+
prefix="iwslt-ta", dataset_parts=dataset_parts, output_dir=src_dir
90+
)
91+
assert manifests is not None
92+
93+
extractor = KaldifeatFbank(
94+
KaldifeatFbankConfig(
95+
frame_opts=KaldifeatFrameOptions(sampling_rate=sampling_rate),
96+
mel_opts=KaldifeatMelOptions(num_bins=num_mel_bins),
97+
device="cuda",
98+
)
99+
)
100+
101+
for partition, m in manifests.items():
102+
if (output_dir / f"cuts_{partition}.jsonl.gz").is_file():
103+
logging.info(f"{partition} already exists - skipping.")
104+
continue
105+
logging.info(f"Processing {partition}")
106+
cut_set = CutSet.from_manifests(
107+
recordings=m["recordings"],
108+
supervisions=m["supervisions"],
109+
)
110+
111+
logging.info("About to split cuts into smaller chunks.")
112+
if sr != None:
113+
logging.info(f"Resampling to {sr}")
114+
cut_set = cut_set.resample(sr)
115+
116+
cut_set = cut_set.trim_to_supervisions(
117+
keep_overlapping=False,
118+
keep_all_channels=False)
119+
cut_set = cut_set.filter(lambda c: c.duration >= .2 and c.duration <= 30)
120+
if "train" in partition:
121+
cut_set = (
122+
cut_set
123+
+ cut_set.perturb_speed(0.9)
124+
+ cut_set.perturb_speed(1.1)
125+
)
126+
cut_set = cut_set.to_eager()
127+
chunk_size = len(cut_set) // args.num_splits
128+
cut_sets = cut_set.split_lazy(
129+
output_dir=src_dir / f"cuts_train_raw_split{args.num_splits}",
130+
chunk_size=chunk_size,)
131+
start = args.start
132+
stop = min(args.stop, args.num_splits) if args.stop > 0 else args.num_splits
133+
num_digits = len(str(args.num_splits))
134+
135+
for i in range(start, stop):
136+
idx = f"{i + 1}".zfill(num_digits)
137+
cuts_train_idx_path = src_dir / f"cuts_train_{idx}.jsonl.gz"
138+
logging.info(f"Processing train split {i}")
139+
cs = cut_sets[i].compute_and_store_features_batch(
140+
extractor=extractor,
141+
storage_path=output_dir / f"feats_train_{idx}",
142+
batch_duration=1000,
143+
num_workers=8,
144+
storage_type=LilcomChunkyWriter,
145+
overwrite=True,
146+
)
147+
cs.to_file(cuts_train_idx_path)
148+
else:
149+
logging.info(f"Processing {partition}")
150+
cut_set = cut_set.compute_and_store_features_batch(
151+
extractor=extractor,
152+
storage_path=output_dir / f"feats_{partition}",
153+
batch_duration=1000,
154+
num_workers=10,
155+
storage_type=LilcomChunkyWriter,
156+
overwrite=True,
157+
)
158+
cut_set.to_file(output_dir / f"cuts_{partition}.jsonl.gz")
159+
160+
if __name__ == "__main__":
161+
formatter = (
162+
"%(asctime)s %(levelname)s [%(filename)s:%(lineno)d] %(message)s"
163+
)
164+
165+
logging.basicConfig(format=formatter, level=logging.INFO)
166+
args = get_args()
167+
168+
compute_fbank_gpu(args)
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../../librispeech/ASR/local/compute_fbank_musan.py
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../ST/local/cuts_validate.py
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../../librispeech/ASR/local/display_manifest_statistics.py
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../../librispeech/ASR/local/filter_cuts.py
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
../../../librispeech/ASR/local/generate_unique_lexicon.py

0 commit comments

Comments
 (0)