Thanks for your impressive work.
I have trained mistral VM following PRM/train_VM_mistral.py and use it to guide ToT evaluation in evaluate.py. But after training 2 epochs following recommended setting, the test accuracy is only 0.1582. And it outputs unreliable scores to tree nodes.
Since the default depth and branch is limited and exploring nodes are ranked by values, an relatively accurate score seems necessary. So I wonder if this is a normal situation, and how do you handle this problem?
Looking forward for your help, thanks.
Thanks for your impressive work.
I have trained mistral VM following
PRM/train_VM_mistral.pyand use it to guide ToT evaluation inevaluate.py. But after training 2 epochs following recommended setting, the test accuracy is only 0.1582. And it outputs unreliable scores to tree nodes.Since the default depth and branch is limited and exploring nodes are ranked by values, an relatively accurate score seems necessary. So I wonder if this is a normal situation, and how do you handle this problem?
Looking forward for your help, thanks.