@paul-gauthier - i'm really inspired by this benchmark!
in your blog post, you mentioned "The new benchmark uses the 225 problems that were solved by 3 or fewer models. "
do you have the data on which problems were solved by which models? i looked here but it only seems to be the summaries.
it would be helpful to see this data to help me partition the benchmark into easy/medium/hard problems. i'm also interested in running optimizations to get a specific model to overcome problems it previously got wrong (without having to run the whole benchmark every time, which for some models is expensive).
thanks!
@paul-gauthier - i'm really inspired by this benchmark!
in your blog post, you mentioned "The new benchmark uses the 225 problems that were solved by 3 or fewer models. "
do you have the data on which problems were solved by which models? i looked here but it only seems to be the summaries.
it would be helpful to see this data to help me partition the benchmark into easy/medium/hard problems. i'm also interested in running optimizations to get a specific model to overcome problems it previously got wrong (without having to run the whole benchmark every time, which for some models is expensive).
thanks!