Skip to content

Commit 0d388dc

Browse files
committed
fix: Update generation arguments in constant.py and server.py to include eos_token_id and pad_token_id, and improve handling of bad_words_ids in GlobalModelManager.
1 parent 36b5e9b commit 0d388dc

2 files changed

Lines changed: 5 additions & 8 deletions

File tree

utils/constant.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@
7171
'max_new_tokens': 512,
7272
'repetition_penalty': 1.2,
7373
'length_penalty': 1.0,
74+
'eos_token_id': 151645,
75+
'pad_token_id': 151643,
7476
}
7577

7678
MAX_HISTORY_TURNS = 8 # Keep only the latest 8 rounds of conversation (16 messages: 8 user + 8 assistant)

web_demo/server/server.py

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -182,15 +182,10 @@ def initialize(self, model_path: str, target_sample_rate: int = 16000):
182182
).to(self.device)
183183

184184
# set gen args
185-
self.gen_kwargs = DEFAULT_S2M_GEN_KWARGS
186-
if self.gen_kwargs['eos_token_id'] is None:
187-
self.gen_kwargs['eos_token_id'] = [self.processor.tokenizer.eos_token_id,
188-
self.processor.tokenizer.convert_tokens_to_ids("<|im_end|>")]
189-
if self.gen_kwargs['pad_token_id'] is None:
190-
self.gen_kwargs['pad_token_id'] = self.processor.tokenizer.pad_token_id
191-
if self.gen_kwargs['bad_words_ids'] is None:
185+
self.gen_kwargs = DEFAULT_S2M_GEN_KWARGS.copy()
186+
if 'bad_words_ids' not in self.gen_kwargs or self.gen_kwargs['bad_words_ids'] is None:
192187
self.gen_kwargs['bad_words_ids'] = [[self.processor.tokenizer.convert_tokens_to_ids('<|audio_bos|>'),
193-
self.processor.tokenizer.convert_tokens_to_ids('<|sil|>')]]
188+
self.processor.tokenizer.convert_tokens_to_ids('<|sil|>')]]
194189

195190
self.model.sp_gen_kwargs = DEFAULT_SP_GEN_KWARGS.copy()
196191

0 commit comments

Comments
 (0)