mmlu_pro score is very low for Qwen2.5-32B-Instruct (8.01).
When I changed generate_prompt() in MMLUProTaskHandler class like generate_prompt() MMLUTaskHandler class, the score was increased into 57.80.
skythought/evals/tasks/mmlu/mmlu_handler.py
class MMLUProTaskHandler(MMLUTaskHandler):
def generate_prompt(self, prompt):
multiple_choice_string = self.get_multiple_choice_answers(prompt) # ADDED
prompt = prompt["question"] + "\n" + multiple_choice_string # ADDED
return self.task_config.templating_parameters["template"].format(prompt=prompt)
mmlu_pro score is very low for Qwen2.5-32B-Instruct (8.01).
When I changed generate_prompt() in MMLUProTaskHandler class like generate_prompt() MMLUTaskHandler class, the score was increased into 57.80.
skythought/evals/tasks/mmlu/mmlu_handler.py
class MMLUProTaskHandler(MMLUTaskHandler):def generate_prompt(self, prompt):multiple_choice_string = self.get_multiple_choice_answers(prompt) # ADDEDprompt = prompt["question"] + "\n" + multiple_choice_string # ADDEDreturn self.task_config.templating_parameters["template"].format(prompt=prompt)