Skip to content

Commit 2d2470b

Browse files
fix the CTC zipformer2 training (#1713)
- too many supervision tokens - change filtering rule to `if (T - 2) < len(tokens): return False` - this prevents inf. from appearing in the CTC loss value
1 parent 42f3db8 commit 2d2470b

1 file changed

Lines changed: 4 additions & 2 deletions

File tree

egs/librispeech/ASR/zipformer/train.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1409,9 +1409,11 @@ def remove_short_and_long_utt(c: Cut):
14091409
T = ((c.num_frames - 7) // 2 + 1) // 2
14101410
tokens = sp.encode(c.supervisions[0].text, out_type=str)
14111411

1412-
if T < len(tokens):
1412+
# For CTC `(T - 2) < len(tokens)` is needed. otherwise inf. in loss appears.
1413+
# For Transducer `T < len(tokens)` was okay.
1414+
if (T - 2) < len(tokens):
14131415
logging.warning(
1414-
f"Exclude cut with ID {c.id} from training. "
1416+
f"Exclude cut with ID {c.id} from training (too many supervision tokens). "
14151417
f"Number of frames (before subsampling): {c.num_frames}. "
14161418
f"Number of frames (after subsampling): {T}. "
14171419
f"Text: {c.supervisions[0].text}. "

0 commit comments

Comments
 (0)