Source(s) for inferencing for the tests.
Something like OpenRouter? Perform some of the smaller models benchmarking locally?
Add models based on whats hot/chatter on the net? Open to suggestions!
When we add models, we can test just that model for the current period's benchmark dataset.
Source(s) for inferencing for the tests.
Something like OpenRouter? Perform some of the smaller models benchmarking locally?
Add models based on whats hot/chatter on the net? Open to suggestions!
When we add models, we can test just that model for the current period's benchmark dataset.