用的是 基于fairseq的预训练模型微调,使用的是10h的那个配置文件和base.pt模型
data.tsv里面是有内容的,音频也是16k的
有修改的配置文件如下
`
checkpoint:
save_interval_updates: 1000 # 每多少更新保存一次检查点
keep_interval_updates: 10 # 保留多少个检查点
keep_last_epochs: 3 # 保留多少个epoch的检查点
no_epoch_checkpoints: false # 是否禁用按epoch保存
no_last_checkpoints: false # 是否禁用保存最后一个检查点
no_save: false # 是否完全禁用保存
best_checkpoint_metric: uer
task:
_name: spec_finetuning
data: ???
min_sample_size: 100
max_sample_size: 4000
normalize: true
target_dictionary: /root/asr/dataset/wav/dict
labels: ltr
`
最后几次训练的数据如下
[2025-06-15 09:30:37,610][train][INFO] - {"epoch": 160, "train_loss": "36.603", "train_ntokens": "1741.08", "train_nsentences": "543.04", "train_nll_loss": "11.416", "train_wps": "1473.3", "train_ups": "0.85", "train_wpb": "1741.1", "train_bsz": "543", "train_num_updates": "3993", "train_lr": "4.04216e-06", "train_gnorm": "6.184", "train_clip": "44", "train_loss_scale": "1", "train_train_wall": "14", "train_gb_free": "4.5", "train_wall": "666"}
[2025-06-15 09:30:37,611][fairseq.tasks.fairseq_task][INFO] - can_reuse_epoch_itr = True
[2025-06-15 09:30:37,777][fairseq.tasks.fairseq_task][INFO] - creating new batches for epoch 161
[2025-06-15 09:30:37,833][fairseq.data.iterators][INFO] - grouped total_num_itrs = 25
[2025-06-15 09:30:37,841][fairseq.trainer][INFO] - begin training epoch 161
[2025-06-15 09:30:37,842][fairseq_cli.train][INFO] - Start iterating over samples
[2025-06-15 09:30:42,553][train_inner][INFO] - {"epoch": 161, "update": 160.28, "loss": "36.046", "ntokens": "1734.46", "nsentences": "542.24", "nll_loss": "11.269", "wps": "1464.4", "ups": "0.84", "wpb": "1734.5", "bsz": "542.2", "num_updates": "4000", "lr": "4e-06", "gnorm": "6.408", "clip": "60", "loss_scale": "1", "train_wall": "110", "gb_free": "4.4", "wall": "671"}
[2025-06-15 09:30:42,553][fairseq_cli.train][INFO] - Stopping training due to num_updates: 4000 >= max_update: 4000
执行完成后发现HYPO的内容都为空
尝试在infer.py打印了一些内容,发现都为0,
def process_sample(self, sample: Dict[str, Any]) -> None:
self.gen_timer.start()
hypos = self.task.inference_step(
generator=self.generator,
models=self.models,
sample=sample,
)
tokens0 = hypos[0][0]["tokens"]
print(hypos)
print("DEBUG tokens0:", tokens0) # 打印 tensor
print("DEBUG tokens0 len:", len(tokens0))
print("DEBUG sample keys:", sample.keys())
以下是推理日志
(venv) root@DESKTOP-PG3JMJU:~/asr# ./decode.sh
2025-06-15 10:03:56 | INFO | main | import user_dir: /root/asr/TeleSpeech-ASR/data2vec_dialect
/mnt/d/python_work/speech/venv/lib/python3.10/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
[2025-06-15 10:04:11,544][main][INFO] - /root/asr/model/checkpoint_161_4000.pt
[2025-06-15 10:04:15,626][data2vec_dialect.models.data2vec2][INFO] - making target model
/root/asr/TeleSpeech-ASR/data2vec_dialect/models/data2vec2.py:360: FutureWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/main/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
state = super().state_dict(destination, prefix, keep_vars)
[2025-06-15 10:04:20,200][data2vec_dialect.data.spec_dataset][INFO] - loaded 3, skipped 0 samples
[2025-06-15 10:04:20,200][data2vec_dialect.data.spec_dataset][INFO] - ATTENTION!!! skip indices: set()
Dataset size: 3
[2025-06-15 10:04:20,212][fairseq.tasks.fairseq_task][INFO] - can_reuse_epoch_itr = True
[2025-06-15 10:04:20,213][fairseq.tasks.fairseq_task][INFO] - reuse_dataloader = True
[2025-06-15 10:04:20,213][fairseq.tasks.fairseq_task][INFO] - rebuild_batches = True
[2025-06-15 10:04:20,213][fairseq.tasks.fairseq_task][INFO] - batches will be rebuilt for each epoch
[2025-06-15 10:04:20,213][fairseq.tasks.fairseq_task][INFO] - creating new batches for epoch 1
0%| | 0/1 [00:00<?, ?it/s][[{'tokens': tensor([], dtype=torch.int64), 'score': tensor(-3.2362)}], [{'tokens': tensor([], dtype=torch.int64), 'score': tensor(-1.0267)}], [{'tokens': tensor([], dtype=torch.int64), 'score': tensor(-1.0286)}]]
DEBUG tokens0: tensor([], dtype=torch.int64)
DEBUG tokens0 len: 0
DEBUG sample keys: dict_keys(['id', 'net_input', 'target_lengths', 'ntokens', 'target'])
DEBUG hyp_pieces:
[2025-06-15 10:04:22,614][main][INFO] - HYPO:
[2025-06-15 10:04:22,614][main][INFO] - REF: 今天是星期二
[2025-06-15 10:04:22,615][main][INFO] - ---------------------
DEBUG hyp_pieces:
[2025-06-15 10:04:22,616][main][INFO] - HYPO:
[2025-06-15 10:04:22,617][main][INFO] - REF: 廿一
[2025-06-15 10:04:22,617][main][INFO] - ---------------------
DEBUG hyp_pieces:
[2025-06-15 10:04:22,618][main][INFO] - HYPO:
[2025-06-15 10:04:22,618][main][INFO] - REF: 一对大学生
[2025-06-15 10:04:22,618][main][INFO] - ---------------------
[2025-06-15 10:04:22,631][main][INFO] - Processed 3 sentences (0 tokens) in 2.2s 1.34 sentences per second, 0.45 tokens per second)
[2025-06-15 10:04:22,636][main][INFO] - Word error rate: 100.0000
用的是 基于fairseq的预训练模型微调,使用的是10h的那个配置文件和base.pt模型
data.tsv里面是有内容的,音频也是16k的
有修改的配置文件如下
`
checkpoint:
save_interval_updates: 1000 # 每多少更新保存一次检查点
keep_interval_updates: 10 # 保留多少个检查点
keep_last_epochs: 3 # 保留多少个epoch的检查点
no_epoch_checkpoints: false # 是否禁用按epoch保存
no_last_checkpoints: false # 是否禁用保存最后一个检查点
no_save: false # 是否完全禁用保存
best_checkpoint_metric: uer
task:
_name: spec_finetuning
data: ???
min_sample_size: 100
max_sample_size: 4000
normalize: true
target_dictionary: /root/asr/dataset/wav/dict
labels: ltr
`
最后几次训练的数据如下
[2025-06-15 09:30:37,610][train][INFO] - {"epoch": 160, "train_loss": "36.603", "train_ntokens": "1741.08", "train_nsentences": "543.04", "train_nll_loss": "11.416", "train_wps": "1473.3", "train_ups": "0.85", "train_wpb": "1741.1", "train_bsz": "543", "train_num_updates": "3993", "train_lr": "4.04216e-06", "train_gnorm": "6.184", "train_clip": "44", "train_loss_scale": "1", "train_train_wall": "14", "train_gb_free": "4.5", "train_wall": "666"}
[2025-06-15 09:30:37,611][fairseq.tasks.fairseq_task][INFO] - can_reuse_epoch_itr = True
[2025-06-15 09:30:37,777][fairseq.tasks.fairseq_task][INFO] - creating new batches for epoch 161
[2025-06-15 09:30:37,833][fairseq.data.iterators][INFO] - grouped total_num_itrs = 25
[2025-06-15 09:30:37,841][fairseq.trainer][INFO] - begin training epoch 161
[2025-06-15 09:30:37,842][fairseq_cli.train][INFO] - Start iterating over samples
[2025-06-15 09:30:42,553][train_inner][INFO] - {"epoch": 161, "update": 160.28, "loss": "36.046", "ntokens": "1734.46", "nsentences": "542.24", "nll_loss": "11.269", "wps": "1464.4", "ups": "0.84", "wpb": "1734.5", "bsz": "542.2", "num_updates": "4000", "lr": "4e-06", "gnorm": "6.408", "clip": "60", "loss_scale": "1", "train_wall": "110", "gb_free": "4.4", "wall": "671"}
[2025-06-15 09:30:42,553][fairseq_cli.train][INFO] - Stopping training due to num_updates: 4000 >= max_update: 4000
执行完成后发现HYPO的内容都为空
尝试在infer.py打印了一些内容,发现都为0,
def process_sample(self, sample: Dict[str, Any]) -> None:
self.gen_timer.start()
hypos = self.task.inference_step(
generator=self.generator,
models=self.models,
sample=sample,
)
tokens0 = hypos[0][0]["tokens"]
print(hypos)
print("DEBUG tokens0:", tokens0) # 打印 tensor
print("DEBUG tokens0 len:", len(tokens0))
print("DEBUG sample keys:", sample.keys())
以下是推理日志
(venv) root@DESKTOP-PG3JMJU:~/asr# ./decode.sh
2025-06-15 10:03:56 | INFO | main | import user_dir: /root/asr/TeleSpeech-ASR/data2vec_dialect
/mnt/d/python_work/speech/venv/lib/python3.10/site-packages/timm/models/layers/init.py:48: FutureWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
warnings.warn(f"Importing from {name} is deprecated, please import via timm.layers", FutureWarning)
[2025-06-15 10:04:11,544][main][INFO] - /root/asr/model/checkpoint_161_4000.pt
[2025-06-15 10:04:15,626][data2vec_dialect.models.data2vec2][INFO] - making target model
/root/asr/TeleSpeech-ASR/data2vec_dialect/models/data2vec2.py:360: FutureWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/main/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
state = super().state_dict(destination, prefix, keep_vars)
[2025-06-15 10:04:20,200][data2vec_dialect.data.spec_dataset][INFO] - loaded 3, skipped 0 samples
[2025-06-15 10:04:20,200][data2vec_dialect.data.spec_dataset][INFO] - ATTENTION!!! skip indices: set()
Dataset size: 3
[2025-06-15 10:04:20,212][fairseq.tasks.fairseq_task][INFO] - can_reuse_epoch_itr = True
[2025-06-15 10:04:20,213][fairseq.tasks.fairseq_task][INFO] - reuse_dataloader = True
[2025-06-15 10:04:20,213][fairseq.tasks.fairseq_task][INFO] - rebuild_batches = True
[2025-06-15 10:04:20,213][fairseq.tasks.fairseq_task][INFO] - batches will be rebuilt for each epoch
[2025-06-15 10:04:20,213][fairseq.tasks.fairseq_task][INFO] - creating new batches for epoch 1
0%| | 0/1 [00:00<?, ?it/s][[{'tokens': tensor([], dtype=torch.int64), 'score': tensor(-3.2362)}], [{'tokens': tensor([], dtype=torch.int64), 'score': tensor(-1.0267)}], [{'tokens': tensor([], dtype=torch.int64), 'score': tensor(-1.0286)}]]
DEBUG tokens0: tensor([], dtype=torch.int64)
DEBUG tokens0 len: 0
DEBUG sample keys: dict_keys(['id', 'net_input', 'target_lengths', 'ntokens', 'target'])
DEBUG hyp_pieces:
[2025-06-15 10:04:22,614][main][INFO] - HYPO:
[2025-06-15 10:04:22,614][main][INFO] - REF: 今天是星期二
[2025-06-15 10:04:22,615][main][INFO] - ---------------------
DEBUG hyp_pieces:
[2025-06-15 10:04:22,616][main][INFO] - HYPO:
[2025-06-15 10:04:22,617][main][INFO] - REF: 廿一
[2025-06-15 10:04:22,617][main][INFO] - ---------------------
DEBUG hyp_pieces:
[2025-06-15 10:04:22,618][main][INFO] - HYPO:
[2025-06-15 10:04:22,618][main][INFO] - REF: 一对大学生
[2025-06-15 10:04:22,618][main][INFO] - ---------------------
[2025-06-15 10:04:22,631][main][INFO] - Processed 3 sentences (0 tokens) in 2.2s 1.34 sentences per second, 0.45 tokens per second)
[2025-06-15 10:04:22,636][main][INFO] - Word error rate: 100.0000