Duration metrics

I think some metrics regarding the total and average time it took each model to solve the tasks would be a helpful addition to the leaderboard, especially for reasoning models.

By just looking at the percentage completed and cost involved, I could get a wrong impression that model A is a better choice that model B, although realistically, model B might be a much better fit for everyday use as it solves tasks in 1/10th the time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Duration metrics #6

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Duration metrics #6

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions